Why are local variables definitely assigned in unreachable statements?

You’re probably all familiar with the feature of C# which disallows reading from a local variable before it has been “definitely assigned”:

void M()
{
  int x; 
  if (Q()) 
    x = 123; 
  if (R()) 
    Console.WriteLine(x); // illegal! 
}

This is illegal because there is a path through the code which, if taken, results in the local variable being read from before it has been assigned; in this case, if Q() returns false and R() returns true.

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values.[1. The C and C++ programming languages do not necessarily, and will cheerfully allow you to read garbage from an uninitialized local.] Rather, it is because the existence of such a code path is probably a bug, and we want to throw you into the Pit of Success; you should have to work hard to write that bug.

The way in which the compiler determines if there is any path through the code which causes x to be read before it is written is quite interesting, but that’s a subject for another day. The question I want to consider today is: why are local variables considered to be definitely assigned inside unreachable statements?

void M() 
{ 
  int x; 
  if (Q()) 
    x = 123; 
  if (false) 
    Console.WriteLine(x); // legal! 
}

First off, obviously the way I’ve described the feature immediately gives the intuition that this ought to be legal. Clearly there is no path through the code which results in the local variable being read before it is assigned. In fact, there is no path through the code that results in the local variable being read, period!

On the other hand: that code looks wrong. We do not allow syntax errors, or overload resolution errors, or convertibility errors, or any other kind of error, in an unreachable statement, so why should we allow definite assignment errors?

It’s a subtle point, I admit. Here’s the thing. You have to ask yourself “why is there unreachable code in the method in the first place?” Either that unreachable code is deliberate, or it is an error.

If it is an error, then something is deeply messed up here. The programmer did not intend the written control flow in the first place. It seems premature to guess at what the definite assignment errors are in the unreachable code, since the control flow that would be used to determine definite assignment state is wrong. We are going to give a warning about the unreachable code; the user can then notice the warning and fix the control flow. Once it is fixed, then we can consider whether there are definite assignment problems with the fixed control flow.

Now, why on earth would someone deliberately make unreachable code? It does in fact happen; actually it happens quite frequently when dealing with libraries made by another team that are not quite done yet:

// If we can resrov the frob into a glob, do that and then blorg 
// the result. Even if the frob is not a glob, we know it is 
// definitely a resrovable blob, so resrov it as a blob and then
// blorg the result. Finally, fribble the blorgable result, 
// regardless of whether it was a glob or a blob. 
void BlorgFrob(Frob frob) 
{ 
  IBlorgable blorgable;   
  // TODO: Glob.TryResrov has not been ported to C# yet. 
  if (false /* Glob.TryResrov(out blorgable, frob) */) 
  { 
    BlorgGlob(blorgable); 
  } 
  else 
  { 
    blorgable = Blob.Resrov(frob) 
    BlorgBlob(blorgable); 
  } 
  blorgable.Fribble(frob); 
}

Should BlorgGlob(blorgable) be an error? It seems plausible that it should not be an error; after all, it’s never going to read the local. But it is still nice that we get overload resolution errors reported inside the unreachable code, just in case there is something wrong there.

Generating random non-uniform data in C#

UPDATE: I’ve posted a related article here.


When building simulations of real-world phenomena, or when generating test data for algorithms that will be consuming information from the real world, it is often highly desirable to produce pseudo-random data that conform to some non-uniform probability distribution.

But perhaps I have already lost some readers who do not remember STATS 101 all those years ago. I sure don’t. Let’s take a step back. Continue reading

Bad metaphors

The standard way to teach beginner OO programmers about classes is to make a metaphor to the real world. And indeed, I do this all the time in this blog, usually to the animal kingdom. A “class” in real life codifies a commonality amongst a certain set of objects: mammals, for example, have many things in common; they have backbones, can grow hair, can make their own heat, and so on. A class in a programming language does the same thing: codifies a commonality amongst a certain set of objects via the mechanism of inheritance. Inheritance ensures commonalities because, as we’ve already discussed, “inheritance” by definition means “all[1. Excepting constructors and destructors.] the members of the base type are also members of the derived type”.
Continue reading

What is the defining characteristic of a local variable?

If you ask a dozen C# developers what a “local variable” is, you might get a dozen different answers. A common answer is of course that a local is “a storage location on the stack”. But that is describing a local in terms of its implementation details; there is nothing in the C# language that requires that locals be stored on a data structure called “the stack”, or that there be one stack per thread. (And of course, locals are often stored in registers, and registers are not the stack.)

A less implementation-detail-laden answer might be that a local variable is a variable whose storage location is “allocated from the temporary store”. That is, a local variable is a variable whose lifetime is known to be short; the local’s lifetime ends when control leaves the code associated with the local’s declaration space.

That too, however, is a lie. The C# specification is surprisingly vague about the lifetime of an “ordinary” local variable, noting that its lifetime is only kinda-sorta that length. The jitter’s optimizer is permitted broad latitude in its determination of local lifetime; a local can be cleaned up early or late. The specification also notes that the lifetimes of some local variables are necessarily extended beyond the point where control leaves the method body containing the local declaration. Locals declared in an iterator block, for instance, live on even after control has left the iterator block; they might die only when the iterator is itself collected. Locals that are closed-over outer variables of a lambda are the same way; they live at least as long as the delegate that closes over them. And in the upcoming version of C#, locals declared in async blocks will also have extended lifetimes; when the async method returns to its caller upon encountering an “await”, the locals live on and are still there when the method resumes. (And since it might not resume on the same thread, in some bizarre situations, the locals had better not be stored on the stack!)

So if locals are not “variables on the stack” and locals are not “short lifetime variables” then what are locals?

The answer is of course staring you in the face. The defining characteristic of a local is that it can only be accessed by name in the block which declares it; it is local to a block. What makes a local truly unique is that it can only be a private implementation detail of a method body. The name of that local is never of any use to code lexically outside of the method body.

A C# reading list

Just a couple of quick links today.

First: One of the questions I get most frequently is “can you recommend some good books about learning to program better in C#?” The question is usually asked by a developer; the other day I was surprised to get that question from one of the editors of InformIT. She was kind enough to post the list on the InformIT web site, so check it out.

Second: Bill Wagner posts his own follow-up article on my recent MSDN magazine article about async/await. Check it out.

Inheritance and representation

(Note: Not to be confused with Representation and Identity.)

Here’s a question I got this morning:

class Alpha<X>
where X : class
{}
class Bravo<T, U>
where T : class
where U : T
{
  Alpha<U> alpha;
}

This gives a compilation error stating that U cannot be used as a type argument for Alpha‘s type parameter X because U is not known to be a reference type. But surely U is known to be a reference type because U is constrained to be T, and T is constrained to be a reference type. Is the compiler wrong?

Of course not. Bravo<object, int> is perfectly legal and gives a type argument for U which is not a reference type. All the constraint on U says is that U must inherit from T. (More specifically, it must inherit from T or be identical to T, or inherit from a type related to T by some variant conversion. Consult the specification for details.) int inherits from object, so it meets the constraint. All struct types inherit from at least two reference types, and some of them inherit from many more. Enum types inherit from System.Enum, many struct types implement interface types, and so on.

The right thing for the developer to do here is of course to add the reference type constraint to U as well.

That easily-solved problem got me thinking a bit more deeply about the issue. I think a lot of people don’t have a really solid understanding of what “inheritance” means in C#. It is really quite simple: a derived type which inherits from a base type implicitly has all inheritable members of the base type. That’s it! If a base type has a member M then a type that inherits from it has a member M as well.

Of course that’s not quite it; there are some odd corner cases. For example, a class which “inherits” from an interface must have an implementation of every member of that interface, but it could do an explicit interface implementation rather than exposing the interface’s members as its own members. This is yet another reason why I’m not thrilled that we chose the word “inherits” over “implements” to describe interface implementations. Also, certain members like destructors and constructors are not inheritable.

People sometimes ask me if private members are inherited; surely not! What would that even mean? But yes, private members are inherited, though most of the time it makes no difference because the private member cannot be accessed outside of its accessibility domain. However, if the derived class is inside the accessibility domain then it becomes clear that yes, private members are inherited:

class B
{
  private int x;
  private class D : B
  {

D inherits x from B, and since D is inside the accessibility domain of x, it can use x no problem.

I am occasionally asked “but how can a value type, like int, which is 32 bits of memory, no more, no less, possibly inherit from object?  An object laid out in memory is way bigger than 32 bits; it’s got a sync block and a virtual function table and all kinds of stuff in there.”  Apparently lots of people think that inheritance has something to do with how a value is laid out in memory. But how a value is laid out in memory is an implementation detail, not a contractual obligation of the inheritance relationship! When we say that int inherits from object, what we mean is that if object has a member — say, ToString — then int has that member as well. When you call ToString on something of compile-time type object, the compiler generates code which goes and looks up that method in the object’s virtual function table at runtime. When you call ToString on something of compile-time type int, the compiler knows that int is a sealed value type that overrides ToString, and generates code which calls that function directly. And when you box an int, then at runtime we do lay out an int the same way that any reference-typed object is laid out in memory.

But there is no requirement that int and object be always laid out the same in memory just because one inherits from the other; all that is required is that there be some way for the compiler to generate code that honours the inheritance relationship.