Confusing errors for a confusing feature, part one

There’s a saying amongst programming language designers that every language is a response to previous languages; the designers of C# were, and still are, very deliberate about learning from the mistakes and successes of similar languages such as C, C++, Java, Scala and so on. One feature of C# that I have a love-hate relationship with is a direct response to a dangerous feature of C++, whereby the same name can be used to mean two different things throughout a block. I’ve already discussed the relevant rules of C# at length, so review my earlier posting before you read on.

OK, welcome back. Summing up:

  • C++ allows one name to mean two things when one local variable shadows another.
  • C++ allows one name to mean two things when one usage of a name refers to a member and a local variable of the same name is declared later.
  • Both of these features make it harder to understand, debug and maintain programs.
  • C# makes all that illegal; every simple name must have a unique meaning throughout its containing block, which implies that the name of a local variable may not shadow any other local or be used to refer to any member.

I have a love-hate relationship with this “unique meaning” feature, which we are going to look at in absurd depth in this series.

On the “love” side, this really does address a real source of bugs. I have myself accidentally-and-not-on-purpose introduced a local variable in one scope that shadows another and it took me literally hours of debugging to discover why it was that what I thought was one variable was suddenly changing values halfway through a function. On the “hate” side, based on my mail and questions on StackOverflow, misunderstanding what the feature is and why C# has this limitation is a frequent source of user confusion. It was also a rich source of bugs in the original C# compiler, spec problems and design problems in the Roslyn compiler.

But what makes it far, far worse for users is that the error messages given when a program violates these rules of C# are… well, you can see for yourself. Let’s start with the simplest possible case: a local variable which shadows another.

class C
{
  static void M1()
  {
    int x2;
    {
      int x2;
    }
  }

This gives the error on the inner x2:

 
error CS0136: A local variable named 'x2' cannot be declared in
this scope because it would give a different meaning to 'x2', which is
already used in a 'parent or current' scope to denote something else

Um… what?

It is no wonder I get mail from confused programmers when they see that crazy error message! And while we’re looking at this, why is 'parent or current' in quotes, and why doesn’t the compiler differentiate between whether it is the parent scope or the current scope? This is already weird, but don’t worry, it’ll get worse.

What if we cause the same bug the other way? Remember, a local variable is in scope throughout its block, not at the moment where it is declared; this means that you can move the location of a declaration around without changing the legality or meaning of the program, so we should expect this to be the same error:

  static void M3()
  {
    {
      int x4;
    }
    int x4;
  }

This gives the error on the outer x4:

 
error CS0136: A local variable named 'x4' cannot be declared in
this scope because it would give a different meaning to 'x4', which is
already used in a 'child' scope to denote something else

Aha! This is still terrible but at least now the silly quoted string is making some sense; the compiler is filling in either “parent or current” or “child” into the error message, and whoever wrote the error message felt that such a substitution ought to be called out in quotation marks for some reason.

It’s still weird that they went to all that trouble to differentiate between “parent or current” and “child” but didn’t bother to differentiate between “parent” and “current”. (And it is also weird that apparently scopes have a parent-child relationship; when a mommy and a daddy scope love each other very much… no, let’s not even go there. I think of scopes as having a container-contained relationship, not a parent-child relationship, but perhaps I am in a minority there.)

Another subtle thing to notice here: the error message is given on the second usage of the simple name in both cases; is that really the best choice? Ideally an error message should be on the line that needs to change. Which is more likely, that the user will wish to change the x4 in the inner scope or the outer? I would guess the inner, but here the error message is on the outer.

And finally: “something else” ?! Really, C#? That’s the best you can do? The compiler knows exactly what this conflicted with; if it didn’t then it wouldn’t know to make an error! Let’s just make the user guess what the conflict is, why don’t we? Yuck.

OK, that deals with local variables conflicting with other local variables. Next problem: a local variable conflicts with a simple name that was used to mean something else:

static int x5;
static void M6()
{
    int x7 = x5;
    {
        int x5;
    }
}

This gives the error on the declaration of the local:

error CS0136: A local variable named 'x5' cannot be declared in
this scope because it would give a different meaning to 'x5', which is
already used in a 'parent or current' scope to denote something else

Ah, same thing. This error message makes a lot more sense for this scenario than it does for the “hiding a local with a local” previous scenario. And of course if we reverse the order, putting the nested-block local before the field reference…

static int x8;
static void M9()
{
    {
        int x8;
    }
    int x10 = x8;
}

We get this error on the initialization of x10:

error CS0135: 'x8' conflicts with the declaration 'C.x8'

WHAT!?!?!?!?!? Why on earth did we not get the same message as before with “child” substituted in?!!?

This error message makes no sense at all; even an expert on the C# specification would be forgiven for not knowing what on earth this means.

So far I’ve shown cases where a local variable declaration conflicts with something else. Next time we’ll look at scenarios where the same simple name is used to mean two different things, but no local variable (or local constant, formal parameter or range variable) is involved. It might not be immediately obvious how it is possible that one simple name can be used to mean two completely different things without introducing a new name in a local scope, but it is possible. Your challenge is to find one that produces a compiler error before the next episode!

70 thoughts on “Confusing errors for a confusing feature, part one

        • In some cases, there would be no more concise way to describe a variable than to describe the code used to compute its value. If the variable will only be used in the immediate vicinity of its assignment, I would posit that a “meaningful variable name” would represent visual clutter and be semantically less helpful than a short name which would cause someone reading the code to look up a few lines and see where the variable was actually assigned. Compelling people to come up with unique names for such “throwaway” variables isn’t helpful, IMHO. Better would be to have a nice mechanism for tagging them as “throwaways”.

  1. “I think of scopes as having a container-contained relationship, not a parent-child relationship, but perhaps I am in a minority there.”
    This is somewhat common in the IT industry.
    For example on a WPF UI controls do have an obvious container-contained relationship while the Panel class has Parent and Child properties.
    As for XML, elements do have an obvious container-contained relationship while XPath have axis specifiers like children, parent, sibling etc.
    Maybe because the logical model of these constructs are usually based on trees (only the representational model shows the container-contained relationship).

    • Scopes do have and enforce such a relationship, but useful variable lifetimes do not. It’s not uncommon to create a couple of temporary variables, use their values to determine the value for a “real” variable, and then abandon the temporary ones. The useful lifetime of the real variable starts between the creation and abandonment of the temporary variables; if the real variable is only assigned in this one place, its lifetime should begin when it is assigned, but C-style scoping structures do not allow such overlapped lifetimes even though, interestingly, some assembly languages did.

      • Be careful when using the word “lifetime” in a language where closures exist.

        But hey, I assume you meant (lexical) scope. And you can always introduce a scope, e.g. like this:
        int result;
        {
        int part1 = // …
        int part2 = // …
        result = part1 * part1 + part2 * part2;
        }
        // use result here

        • If a variable is only going to be written once, it’s useful to be able to specify that. Java allows the syntax `final int x=y;` or `final int[] foo = new int[4];` to simultaneously declare and initialize a write-once variable. Although C# does not presently allow `readonly` to be applied to locals, such a declaration would allow better closure semantics since they could be passed by value rather than aliased. Unfortunately, there would be no way to say “readonly int result=part1*part1 + part2*part2;” and have that variable outlive “part1” and “part2”.

          PS–I find it curious that the makers of C# decided to have it automatically hoist mutable variables into a closure without any special declaration. Given `int x; … StartAction( () => DoSomething(x) ); … ` I would consider it far from obvious that the programmer intends any changes to “x” between the time StartAction executes and the time DoSomething(x) executes to alter the value passed to DoSomething(). I would have liked to see a requirement that variables used within a closure must be marked either at the closure or at their declaration to indicate how they should interact with closures.

          • I would like “readonly” local variables as well; it has never made it to the top of the list.

            I think it is safe to say that many people have regrets about how anonymous methods and lambdas evolved; whether they close over variables or values is certainly part of that. I find it somewhat distressing that “query = from c in customers select c.Name == name;” changes its result set when variable name is changed and changes when the contents of collection customers is changed, but does not change when variable customers is changed! The query stays bound to the original reference stored in customers, not the current value, but is bound to the current value of name, and that’s weird.

          • It’s too bad `readonly` variables didn’t get added to the language with closures; Java won’t allow variables to be closed over unless they’re `final` [the language’s equivalent of “readonly”]. While there are some occasions when it’s useful to close over mutable variables, I suspect cases where one wants the variable to be mutable are *much* rarer than those where one doesn’t want a closure to ever see any value other than the one a variable had when it was created. Is it possible for the compiler to notice that a variable will never be written after a closure is generated, and refrain from hoisting the variable in that case? If so, do you think it might be reasonable for code validation tools to define a hoist-related attributes, and squawk if a variable is hoisted as mutable when it doesn’t have a special attribute indicating such intention?

  2. What would you think of loosening the rule in such a fashion as to permit multiple declarations only in those cases where they would have no “direct” semantic effect, but could serve as a sort of assertion that the variable was not used in such a way that the redeclaration would have an effect? Essentially, allow redeclarations in those cases, and only those cases, where the exact same semantics would be achieved:

    1. If every redeclaration created a new variable and bound the name to it until the end of the current scope, or
    2. If all declarations of variables were hoisted to the tops of their scopes, and any duplicates were simply ignored.

    In cases where the two styles of compilation would yield different semantics, I don’t think evidence of programmer intent would be sufficient to silently favor one over the other and thus the compiler should quite rightly squawk. On the other hand, in cases where both styles of compilation would yield the same result, allowing redeclaration would serve as a means of drawing a “barrier” that would effectively assert “no value assigned to this identifier above this statement in the source code may be read beyond the end of it, and no value assigned in or below this statement will be used above.” In some situations it might be helpful to have a syntax for explicit barriers [e.g. if `varName` is a variable of any type, have “unassign varName;” statement indicate that if it is reachable at all, the following statement will be reachable with “varName” not holding a value] and possibly even have an explicit means of requesting C++ style scoping [e.g. have `hide varName;` and `unhide varName;` statements, require that “unhide” appear following “hide” in the same scope, and provide that an inner scope located entirely between the “hide” and “unhide” may use the identifier as it sees fit. Note that the requirement to pair “hide” and “unhide” should make them safe for use as context-sensitive keywords, since the only existing meaning for “hide foo; unhide foo;” would be to declare “foo” as type “hide” and again as type “unhide”, which should not be legal in any case.

  3. Honestly, I don’t ever recall being confused by this message even the first time I saw it.

    I believe it says ‘parent or current’ because technically the ‘current’ scope is really a conglomeration of all current scopes (that is, the deepest ‘current’ scope and all if its parent scopes). While it is possible to split this further into ‘a parent scope’ and ‘the xxxxx scope’, what do you replace xxxxx with? You can’t say ‘current’, because as I already mentioned the current scope is the same level and all levels above. ‘Sibling scope’ makes no sense because it’s not a sibling and ‘self scope’ is just a “Wtf?” moment. Since there’s no good term that I (or the author of this error message) could come up with to call the deepest sub-scope of the current scope, they used the relative definition of the script instead. xxxxx = current scope – all parent scopes a.k.a. ‘parent or current scope’.

    In summary, it could simply say ‘current scope’, but that’s more confusing to someone who doesn’t have a full understanding of nested scopes (and thereby knows that the current scope includes all parent scopes) than simply saying ‘parent or current scope’.

    • what do you replace xxxxx with?

      AFAIK, all scopes apart from those created by {} blocks inside a method are created by things that have names: packages, classes, methods. A better error message would be to use that name, when available. If not, say “elsewhere in this method” or something.

      Not that I’m complaining. I’ve never had difficulty figuring out the problem from the error message. It’s so much better than C++, where the first error is often the only one worth reading, but frequently it only usefully means “an error of some sort was detected at this point”, which is sadly a lot less helpful than “there is an error of some sort at this point”.

      I have had problems with classes named the same in different packages or named the same as a package; the latter I learned quickly not to do, the former I’m stuck with because SOAP code generation. It seems that class lookup is one place where C# does differ from Java.

  4. There are definitely some crazy error messages around. And I’m looking forward to the rest of the series and the deconstruction of said error messages.

    That said, I am so glad that these errors exist. It’s one of the things I love about C#, the way that the designers have addressed many of these oh-so-common scenarios where C/C++ lets you do some crazy thing that is almost certainly wrong, with nary a peep.

    Another of my favorite C# features is the lack of conversions between bool and numeric types, pretty much completely eliminating the “oh, I accidentally typed = in my if-statement expression when I meant to type ==”.

    Just a little nit from the editorial desk…

    “whomever wrote the error message”

    Nominative case, please. “Whoever”.

  5. For what it’s worth, possibly confusing error messages and all, this is one of my favorite “you never really notice it” features of C#. I’ve been burnt in the past by the C++ problem of having multiple variables having the same name in the same method. That I can’t do that in C# is a huge, if largely invisible, advantage.

    Yeah, the error messages could be clearer, but, it’s the kind of thing I look at and say “huh?” at first, but then quickly: “oh, I’m using x2 in two places with two different meanings”.

    Thank you C# language designers.

  6. Eric, I have posted on my blog talking about what I was thinking as I went through your post, figured I would let you know. I haven’t read all of your posts, but I do take notice when I see a new post, hoping that I can learn more than what I already know.

    thanks for taking time out of your day to write these blog posts.

  7. Even with somewhat unclear error messages, I much prefer the behavior of C# in this regard over that of C++ (or, heaven forbid, Javascript’s hoisting nonsense).

  8. There are C# Design Notes which discuss the same feature: https://roslyn.codeplex.com/discussions/565640 , specifically under Removing the shadowing restriction. Mads Torgersen is also describing a love-hate relationship:

    This is seen as one of the more successful rules of hygiene in C# – it makes code safe for refactoring in many scenarios, and just generally easier to read.

    There are existing cases where this rule is annoying:

    task.ContinueWith(task => … task …); // Same task! Why can’t I name it the same?

    Also, thanks for writing another “absurd depth” series!

  9. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1703

  10. I find this variable name conflict especially annoying in switch case situations :/

    var i= 10;
    switch(i) {
    case 10: int myVar; break;
    case 20: int myVar; break;
    }

    Error A local variable named ‘myVar’ is already defined in this scope

    • In that case I tend to give each case its own block:

      var i= 10;
      switch(i) {
      case 10: {
      int myVar;
      break;
      }
      case 20: {
      int myVar;
      break;
      }
      }

    • pro tip: don’t declare it the second time.

      var i = 10;
      switch (i)
      {
      case 10: int myVar = 1; break;
      case 20: myVar = 2; break;
      }

      • An Ugly protip you got there. It looks like the variable is used without ever being assigned in the second case… Why on earth does the compiler (and the specs) allow such an ugly thing?

        far better to declare the variable once, then assign it when relevant

        var i = 10;
        int myVar;
        switch (i)
        {
        case 10: myVar = 1; break;
        case 20: myVar = 2; break;
        }

    • Because that is solved by the dreaded “this” keyword

      Inside of F if you access x, it will be the parameter. To access x of C, you need to do this.x. I absolutely loathe the this keyword. I also loathe when people use it for constructor injection.

      Suppose you can void F(int x) to just C(int X) { this.x = x } at some point a person is going to do { x = x } and unless you have resharper it will go unnoticed until you have runtime exceptions.

      • Actually, I like the “this” keyword quite a lot. It makes it crystal clear when something is a variable (or argmument) and when something is a member.

        I do think it is unfortunate that you can hide a member with a parameter, though.

        • Count me in: +1 for the “this” keyword. For me at least, it increases reviewability of my own or someone else’s code a lot..
          You don’t need Resharper to be warned about the x = x; redundant assignment.

  11. It’s interesting and ironic that VB apparently does not have this same restriction, because this feels like the same category of rule as “Don’t allow names to differ only by case.”. You’d think they would go together.

    • VB’s rule is a little different, though you bring up an interesting point: I wish languages would simultaneously require that variables be written using the same case as their declaration but *also* have variables shadow the declarations of outer-scope variables which differ only in case. If an outerscope declares `Foo` and an inner scope `foo`, then within the inner scope the identifier `Foo` should refer *neither* to the outer scope’s `Foo` nor the inner scope’s `foo`.

      If a language allows `foo` and `Foo` to be used in the same scope with different meanings, then anyone who reads code aloud must explicitly distinguish uppercase and lowercase variable names. If names must differ in some fashion other than case, then someone who is reading code which is known to compile need not worry about variable case outside of string literals. But if variables which differ only in case can mean different things then it will be necessary about the case of everything. A massive increase in cognitive load, for a relatively minimal payoff.

  12. Pingback: Dew Drop – September 26, 2014 (#1864) | Morning Dew

  13. I’m pretty sure the answer to the challenge involves namespaces… I’m curious if there is another case.

    namespace A
    {
        public class MyClass { }
    }
    namespace B
    {
        public class MyClass { }
    }
    namespace C
    {
      using A;
      using B;
      class Foo
      {
        public void ProduceCompilerError()
        {
          var myVar = new MyClass();
        }
      }
    }
    
    • This produces an error saying that “MyClass” is ambiguous; you’ve used a name once and it means two things. The challenge is to find a case where you use the same name twice and it means two different things.

  14. What about this:

        public abstract class NameCollisionBase {
            public static NameCollision NameCollision { get; set; }
        }
        public class NameCollision : NameCollisionBase {
            public NameCollision Member { get; set; }
            public void SomeMethod() {
                // "NameCollision" as static property
                Member = NameCollision.Member;
                // "NameCollision" as a type
                Member = NameCollision.NameCollision;
                Member = NameCollision.NameCollision.Member;
            }
        }
    
    • You’ve used the same name to mean two different things, but this is not an error. (Additional challenge question: why not?) The challenge question was to find a case where the same name means two different things that is an error.

      • I must be misunderstanding the quiz because this seems pretty obvious and it’s a compile time error: `public class Foo { public int Foo { get { … } } }`

        • The problem is to make two instances of the same name mean two different things in a local scope, such that an error is produced. Your example is not in a local scope, it’s at class scope. Here’s another way to characterize the problem: if either of the two usages is removed, then the program is correct, but using both is an error.

          Again, let me state the rule: a simple name that is used multiple times in a local scope (including times where the usages are in nested scopes) must have the same meaning for every usage. The challenge is to find a program which violates this rule and thereby produces an error, but without one of the conflicting names being due to the introduction of a new local, lambda parameter or range variable.

  15. The following code generates error CS0135 (‘M’ conflicts with the declaration ‘A.B.M’) but compiles successfully if either M() or M = 0 is removed:

    class A
    {
        static void M() { }
    
        class B
        {
            static int M;
    
            void N()
            {
                M();
                M = 0; // CS0135
            }
        }
    }
    
      • How is this a valid answer to the quiz? Either it is not or the quiz is unintentionally misleading considering part one of this series. M doesn’t mean two different things in this code. The compiler resolves M to a method group and sticks with it. There is no ambiguity here. In the examples in part one, the compile time error is thrown because the compiler has no way to decide to which x1 the code is referring to, so x1 can be either definition. In this example M will always resolve to the inner class definition (closest). There is a subtle but important difference. The compile time error gives it away; it tells you you are using M the wrong way, it doesn’t tell you it is not able know what M you are referring to. (maybe thats the whole point of part 2?)

        • Digging a little deeper, could the issue be that the ideal behavior of the compiler should have been to disallows this due to the deeper reason? That is, the possibility to use M to mean two things? This would be a subtly compiler bug with no solution as it would imply a breaking change: this same code but with the variable and method definitions switched (method M() declared in outer clas) compiles correctly.

          • Well I’m not sure what the spec says, but intellisense its telling me M is void B.M() when I hover over either statement so I must deduce that the compiler is resolving M to the method group. (VS 14 CTP 3 by the way)

        • Your statement “in the examples in part one the compile time error is thrown because the compiler has no way to decide to which x1 the code is referring to” is false. The lookup algorithm unambiguously chooses the closer x1 in these cases. The error is not that x1 is ambiguous, the error is that the same name has been used to mean two different things.

          • Which in my book means x1 is ambigous. If it weren’t the compiler woildnt have any issues. If it could resolve what x1 you are referring without a doubt then the issue is moot. The fact that x1 is an ambiguos identifier is the problem.

        • Your statement “in the example M will always resolve to the inner class definition” is false.

          Try removing each usage of M in turn, and you’ll see that there is no error, and that each one resolves to a different M.

          • I did, and M = 0 is always a compile time error, even if I remove M(). Again, this is VS 2014 CTP 3. class Outer
            {
            public static int M;

            class Inner
            {
            public static void M()
            {
            }

            public static void F()
            {
            M = 0; //Cannot assign to ‘M’ because it is a ‘method group’
            }
            }
            }

            Am I misunderstanding something?

          • class A
            {
            static void M() { }

            class B
            {
            static int M;

            void N()
            {
            M();
            M = 0; // CS0135
            }
            }
            }

            Ok, the isse is that in VS 2014 this is not a compile time error. The code I was running is switching the place where M and M() are declared wich is a compile time error and to what I am referring to in my posts.

      • I note that this code compiles perfectly fine in VS 2014 CTP3 (must I understand that the compiler’s current behavior is a bug?). Also I am interested in knowing why the compiler behaves completely different if variable M and method M() declarations switch places.

        • Surprisingly, this isn’t a bug in the compiler. The behavior is specified by the member lookup algorithm in §7.4 of the spec: if a member is invoked, then non-invocable members are removed from the candidate set before members hidden by other members are removed. So the M in M() resolves to A.M because the compiler drops B.M from consideration before it sees that B.M hides A.M. The behavior is different if A.M and B.M are swapped because there’s no corresponding rule to ignore invocable members when a member is not being invoked.

          Eric, will you be explaining why the algorithm was designed this way? I think it would be much more intuitive if hiding were considered before invocability. Even ReSharper 7.1.3 gets this wrong!

  16. The error of “static void M3() { { int x4; } int x4; }” kind really got me when I first ran into it. Because, um, the second x4 is not even shadowing anything?

    All in all, this error is particularly annoying in lambdas:

    var frob = frobList.Where(frob => frob.isFizzy()).FirstOrDefault();

    Nope! Can’t do this, name the argument “x” or something.

    • The second x4 is in scope throughout the block, not just from the point where it is declared. Remember, the point of this is to ensure that you can make simple refactorings such as moving a declaration from the middle to the top of a block without changing its meaning.

  17. Pingback: Confusing errors for a confusing feature, part two | Fabulous adventures in coding

  18. Pingback: Confusing errors for a confusing feature, part three | Fabulous adventures in coding

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s