One of the nice things about a project as large as the Roslyn project is that you have an opportunity to really think hard about your past mistakes and hopefully fix them. When I was working on getting these error messages reported in Roslyn I realized that trying to match exactly the behavior of the original compiler would be actively making the world a worse place, so I took a big step back and started sending a lot of emails to Mads, Neal, Anthony and the rest of the Roslyn gang to try and get a better design worked out. All the godawful nonsense I told you about in the previous two episodes will be fixed in Roslyn.
Tag Archives: simple names
Confusing errors for a confusing feature, part two
Last time I gave you the challenge to find a case where the same simple name means two different things, without introducing a new local/parameter/range variable into scope, that produces an error. It seems like it ought to be impossible; if nothing new has been introduced to a local scope then how can name resolution choose two different things? The relevant section of the C# specification (7.6.2.1 Invariant meaning in blocks) only gives the example I gave last time, of a local having the same name as a field.
The key to solving the riddle is a little-known rule about resolving a name from a set of possible class members: Continue reading
Confusing errors for a confusing feature, part one
There’s a saying amongst programming language designers that every language is a response to previous languages; the designers of C# were, and still are, very deliberate about learning from the mistakes and successes of similar languages such as C, C++, Java, Scala and so on. One feature of C# that I have a love-hate relationship with is a direct response to a dangerous feature of C++, whereby the same name can be used to mean two different things throughout a block. I’ve already discussed the relevant rules of C# at length, so review my earlier posting before you read on.
OK, welcome back. Summing up:
- C++ allows one name to mean two things when one local variable shadows another.
- C++ allows one name to mean two things when one usage of a name refers to a member and a local variable of the same name is declared later.
- Both of these features make it harder to understand, debug and maintain programs.
- C# makes all that illegal; every simple name must have a unique meaning throughout its containing block, which implies that the name of a local variable may not shadow any other local or be used to refer to any member.
I have a love-hate relationship with this “unique meaning” feature, which we are going to look at in absurd depth in this series.
Continue reading
Simple names are not so simple, part two
Today, the solution to the puzzle from last time. The code is correct, and compiles without issue. I was quite surprised when I first learned that; it certainly looks like it violates our rule about not using the same simple name to mean two different things in one block.
The key is to understanding why this is legal is that the query comprehensions and foreach loops are specified as syntactic sugars for another program, and it is that program which is actually analyzed for correctness. Our original program:
static void Main() { int[] data = { 1, 2, 3, 1, 2, 1 }; foreach (var m in from m in data orderby m select m) System.Console.Write(m); }
is transformed into
static void Main() { int[] data = { 1, 2, 3, 1, 2, 1 }; { IEnumerator<int> e = ((IEnumerable<int>)(data.OrderBy(m=>m)).GetEnumerator(); try { int m; // Inside the "while" in C# 5 and above, outside in C# 1 through 4. while(e.MoveNext()) { m = (int)(int)e.Current; Console.Write(m); } } finally { if (e != null) ((IDisposable)e).Dispose(); } } }
There are five usages of m
in this transformed program; m
is:
- declared as the formal parameter of a lambda.
- used in the body of the lambda; here it refers to the formal parameter.
- declared as a local variable
- written to in the loop; here it refers to the local variable
- read from in the loop; here it refers to the local variable
Is there any usage of a local variable before its declaration? No.
Are there any two declarations that have the same name in the same declaration space? It would appear so. The body of Main
defines a local variable declaration space, and clearly the body of Main
contains, indirectly, two declarations for m
, one as a formal parameter and one as a local. But I said last time: local variable declaration spaces have special rules for determining overlaps. It is illegal for a local variable declaration space to directly contain a declaration such that another nested local variable declaration space contains a declaration of the same name. But an outer declaration space which indirectly contains two such declarations is not an error. So in this case, no, there are no local variable declarations spaces which directly contain a declaration for m
, such that a nested local variable declaration space also directly contains a declaration for m
. Our two local variable declarations spaces which directly contain a declaration for m
do not overlap anywhere.
Is there any declaration space which contains two inconsistent usages of the simple name m
? Yes, again, the outer block of Main
contains two inconsistent usages of m
. But again, this is not relevant. The question is whether any declaration spaces directly containing a usage of m
have an inconsistent usage. Again, we have two declaration spaces but they do not overlap each other, so there’s no problem here either.
The thing which makes this legal, interestingly enough, is the generation of the loop variable declaration logically within the try
block. Were it to be generated outside the try
block then this would be a violation of the rule about inconsistent usage of a simple name throughout a declaration space.
Simple names are not so simple, part one
C# has many rules that are designed to prevent some common sources of bugs and encourage good programming practices. So many, in fact, that it is often quite confusing to sort out exactly which rule has been violated. I thought I might spend some time talking about what the different rules are. We’ll finish up with a puzzle.
To begin with, it will be vital to understand the difference between scope and declaration space. To refresh your memory of my earlier article: the scope of an entity is the region of text in which that entity may be referred to by its unqualified name. A declaration space is a region of text in which no two things may have the same name (with an exception for methods which differ by signature.) A “local variable declaration space” is a particular kind of declaration space used for declaring local variables; local variable declaration spaces have special rules for determining when they overlap.
The next thing that you have to understand to make any sense o this is what a “simple name” is. A simple name is always either just a plain identifier, like x
, or, in some cases, a plain identifier followed by a type argument list, like Frob<int, string>
.
Lots of things are treated as “simple names” by the compiler: local variable declarations, lambda parameters, and so on, always have the first form of simple name in their declarations. When you say Console.WriteLine(x);
the Console
and x
are simple names but the WriteLine
is not. Confusingly, there are some textual entities which have the form of simple names, but are not treated as simple names by the compiler. We might talk about some of those situations in later fabulous adventures.
So, without further ado, here are some relevant rules which are frequently confused. It’s rules 3 and 4 that people find particularly confusing.
- It is illegal to refer to a local variable before its declaration. (This seems reasonable I hope.)
- It is illegal to have two local variables of the same name in the same local variable declaration space or nested local variable declaration spaces.
- Local variables are in scope throughout the entire block in which the declaration occurs. This is in contrast with C++, in which local variables are in scope in their block only at points after the declaration.
- For every occurrence of a simple name, whether in a declaration or as part of an expression, all uses of that simple name within the immediately enclosing local variable declaration space must refer to the same entity.
The purpose of all of these rules is to prevent the class of bugs in which the reader/maintainer of the code is tricked into believing they are referring to one entity with a simple name, but are in fact accidentally referring to another entity entirely. These rules are in particular designed to prevent nasty surprises when performing what ought to be safe refactorings.
Consider a world in which we did not have rules 3 and 4. In that world, this code would be legal:
class C { int x; void M() { // 100 lines of code x = 20; // means "this.x"; Console.WriteLine(x); // means "this.x" // 100 lines of code int x = 10; Console.WriteLine(x); // means "local x" } }
This is hard on the person reading the code, who has a reasonable expectation that the two Console.WriteLine(x)
lines do in fact both print out the contents of the same variable. But it is particularly nasty for the maintenance programmer who wishes to impose a reasonable coding standard upon this body of code. “Local variables are declared at the top of the block where they’re used” is a reasonable coding standard in a lot of shops. But changing the code to:
class C { int x; void M() { int x; // 100 lines of code x = 20; // no longer means "this.x"; Console.WriteLine(x); // no longer means "this.x" // 100 lines of code x = 10; Console.WriteLine(x); // means "local x" } }
changes the meaning of the code! We wish to discourage authoring of multi-hundred-line methods, but making it harder and more error-prone to refactor them into something cleaner is not a good way to achieve that goal.
Notice that the original version of this program, rule 3 means that the program violates rule 1 — the first usage of x
is treated as a reference to the local before it is declared. The fact that it violates rule 1 because of rule 3 is precisely what prevents it from being a violation of rule 4! The meaning of x
is consistent throughout the block; it always means the local, and therefore is sometimes used before it is declared. If we scrapped rule 3 then this would be a violation of rule 4, because we would then have two inconsistent meanings for the simple name x
within one block.
Now, these rules do not mean that you can refactor willy-nilly. We can still construct situations in which similar refactorings fail. For example:
class C { int x; void M() { { // 100 lines of code x = 20; // means "this.x"; Console.WriteLine(x); // means "this.x" } { // 100 lines of code int x = 10; Console.WriteLine(x); // means "local x" } } }
This is perfectly legal. We have the same simple name being used two different ways in two different blocks, but the immediately enclosing block of each usage does not overlap that of any other usage. The local variable is in scope throughout its immediately enclosing block, but that block does not overlap the block above. In this case, it is safe to refactor the declaration of the local to the top of its block, but not safe to refactor the declaration to the top of the outermost block; that would change the meaning of x
in the first block. Moving a declaration up is almost always a safe thing to do; moving it out is not necessarily safe.
Now that you know all that, here’s a puzzle for you, a puzzle that I got completely wrong the first time I saw it:
using System.Linq; class Program { static void Main() { int[] data = { 1, 2, 3, 1, 2, 1 }; foreach (var m in from m in data orderby m select m) System.Console.Write(m); } }
It certainly looks like simple name m
is being used multiple times to mean different things. Is this program legal? If yes, why do the rules for not re-using simple names not apply? If no, precisely what rule has been violated?
Next time on FAIC: The solution to this puzzle.