Out parameters and LINQ do not mix

What’s wrong with this code?

var seq = new List { "1", "blah", "3" }; 
int tmp; 
var nums = 
  from item in seq   
  let success = int.TryParse(item, out tmp)   
  select success ? tmp : 0;

The intention is pretty clear: take a sequence of strings and produce a sequence of numbers by turning the elements that can be parsed as numbers into those numbers and the rest into zero.

The C# compiler will give you a definite assignment error if you try this, which seems strange. Why does it do that? Well, think about what code the compiler will translate the last statement into:

var nums =
   seq.Select(item=>new {item, success = int.TryParse(item, out tmp)})
   .Select(transparent => transparent.success ? tmp : 0);

We have two method calls and two lambdas. Clearly the first lambda assigns tmp and the second reads it, but we have no guarantee whatsoever that the first call to Select invokes the lambda! It could, for some bizarre reason of its own, never invoke the lambda. Since the compiler cannot prove that tmp is definitely assigned before it is read, this program is an error.

So is the solution then to assign tmp in the variable declaration? Certainly not! That makes the program compile, but it is a “worst practice” to mutate a variable like this. Remember, that one variable is going to be shared by every delegate invocation! In this simple LINQ-to-Objects scenario it is the case that the delegates are invoked in the sensible order, but even a small change makes this nice property go out the window:

int tmp = 0; 
var nums =
  from item in seq
  let success = int.TryParse(item, out tmp)
  orderby item
  select success ? tmp : 0; 
foreach(var num in nums) 
  Console.WriteLine(num);

Now what happens?

We make an object that represents the query. The query object consists of three steps: do the let projection, do the sort, and do the final projection. Remember, the query is not executed until the first result from it is requested; the assignment to “nums” just produces an object that represents the query, not the results.[1. As I have often said, if I could tell new LINQ users just one thing, it is that fact: query expressions produce a query, not a result set.]

Then we execute the query by entering the body of the loop. Doing so initiates a whole chain of events, but clearly it must be the case that the entire let projection is executed from start to finish over the whole sequence in order to get the resulting pairs to be sorted by the orderby clause. Executing the let projection lambda three times causes tmp to be mutated three times. Only after the sort is completed is the final projection executed, and it uses the current value of tmp, not the value that tmp was back in the distant past!

So what is the right thing to do here? The solution is to write your own extension method version of TryParse the way it would have been written had there been nullable value types available in the first place:

static int? MyTryParse(this string item) {
  int tmp;
  bool success = int.TryParse(item, out tmp);
  return success ? (int?)tmp : (int?)null; 
}

And now we can say:

var nums = 
  from item in seq 
  select item.MyTryParse() ?? 0;

The mutation of the variable is now isolated to the activation of the method, rather than a side effect that is observed by the query. Try to always avoid side effects in queries.


Thanks to Bill Wagner for the question that inspired this blog entry.


Next time on FAIC: Wackiness ensues!

Advertisements

Why are local variables definitely assigned in unreachable statements?

You’re probably all familiar with the feature of C# which disallows reading from a local variable before it has been “definitely assigned”:

void M()
{
  int x; 
  if (Q()) 
    x = 123; 
  if (R()) 
    Console.WriteLine(x); // illegal! 
}

This is illegal because there is a path through the code which, if taken, results in the local variable being read from before it has been assigned; in this case, if Q() returns false and R() returns true.

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values.[1. The C and C++ programming languages do not necessarily, and will cheerfully allow you to read garbage from an uninitialized local.] Rather, it is because the existence of such a code path is probably a bug, and we want to throw you into the Pit of Success; you should have to work hard to write that bug.

The way in which the compiler determines if there is any path through the code which causes x to be read before it is written is quite interesting, but that’s a subject for another day. The question I want to consider today is: why are local variables considered to be definitely assigned inside unreachable statements?

void M() 
{ 
  int x; 
  if (Q()) 
    x = 123; 
  if (false) 
    Console.WriteLine(x); // legal! 
}

First off, obviously the way I’ve described the feature immediately gives the intuition that this ought to be legal. Clearly there is no path through the code which results in the local variable being read before it is assigned. In fact, there is no path through the code that results in the local variable being read, period!

On the other hand: that code looks wrong. We do not allow syntax errors, or overload resolution errors, or convertibility errors, or any other kind of error, in an unreachable statement, so why should we allow definite assignment errors?

It’s a subtle point, I admit. Here’s the thing. You have to ask yourself “why is there unreachable code in the method in the first place?” Either that unreachable code is deliberate, or it is an error.

If it is an error, then something is deeply messed up here. The programmer did not intend the written control flow in the first place. It seems premature to guess at what the definite assignment errors are in the unreachable code, since the control flow that would be used to determine definite assignment state is wrong. We are going to give a warning about the unreachable code; the user can then notice the warning and fix the control flow. Once it is fixed, then we can consider whether there are definite assignment problems with the fixed control flow.

Now, why on earth would someone deliberately make unreachable code? It does in fact happen; actually it happens quite frequently when dealing with libraries made by another team that are not quite done yet:

// If we can resrov the frob into a glob, do that and then blorg 
// the result. Even if the frob is not a glob, we know it is 
// definitely a resrovable blob, so resrov it as a blob and then
// blorg the result. Finally, fribble the blorgable result, 
// regardless of whether it was a glob or a blob. 
void BlorgFrob(Frob frob) 
{ 
  IBlorgable blorgable;   
  // TODO: Glob.TryResrov has not been ported to C# yet. 
  if (false /* Glob.TryResrov(out blorgable, frob) */) 
  { 
    BlorgGlob(blorgable); 
  } 
  else 
  { 
    blorgable = Blob.Resrov(frob) 
    BlorgBlob(blorgable); 
  } 
  blorgable.Fribble(frob); 
}

Should BlorgGlob(blorgable) be an error? It seems plausible that it should not be an error; after all, it’s never going to read the local. But it is still nice that we get overload resolution errors reported inside the unreachable code, just in case there is something wrong there.