Real world async/await defects

Today I’m once again asking for your help.

The headliner feature for C# 5 was of course the await operator for asynchronous methods. The intention of the feature was to make it easier to write asynchronous programs where the program text is not a mass of inside-out code that emphasizes the mechanisms of asynchrony, but rather straightforward, easy-to-read code; the compiler can deal with the mess of turning the code into a form of continuation passing style.

However, asynchrony is still hard to understand, and making methods that allow other code to run before their postconditions are met makes for all manner of interesting possible bugs. There are a lot of great articles out there giving good advice on how to avoid some of the pitfalls, such as:

These are all great advice, but it is not always clear which of these potential defects are merely theoretical, and which you have seen and fixed in actual production code. That’s what I am very interested to learn from you all: what mistakes were made by real people, how were they discovered, and what was the fix?

Here’s an example of what I mean. This defect was found in real-world code; obviously the extraneous details have been removed:

Frob GetFrob()
{
  Frob result = null;
  var networkDevice = new NetworkDevice();
  networkDevice.OnDownload += 
    async (state) => { result = await Frob.DeserializeStateAsync(state); };
  networkDevice.GetSerializedState(); // Synchronous
  return result; 
}

The network device synchronously downloads the serialized state of a Frob. When it is done, the delegate stored in OnDownload runs synchronously, and is passed the state that was just downloaded. But since it is itself asynchronous, the event handler starts deserializing the state asynchronously, and returns immediately. We now have a race between GetFrob returning null, and the mutation of closed-over local result, a race almost certainly won by returning null.

If you’d rather not leave comments here — and frankly, the comment system isn’t much good for code snippets anyways — feel free to email me at eric@lippert.com. If I get some good examples I’ll follow up with a post describing the defects.

About these ads

Lowering in language design, part one

Programming language designers and users talk a lot about the “height” of language features; some languages are considered to be very “high level” and some are considered to be very “low level”. A “high level” language is generally speaking one which emphasizes the business concerns of the program, and a low-level language is one that emphasizes the mechanisms of the underlying hardware. As two extreme examples, here’s a program fragment in my favourite high-level language, Inform7:
Continue reading

Dynamic contagion, part two

This is part two of a two-part series on dynamic contagion. Part one is here.


Last time I discussed how the dynamic type tends to spread through a program like a virus: if an expression of dynamic type “touches” another expression then that other expression often also becomes of dynamic type. Today I want to describe one of the least well understood aspects of method type inference, which also uses a contagion model when dynamic gets involved.

Long-time readers know that method type inference is one of my favourite parts of the C# language; for new readers who might not be familiar with the feature, let me briefly describe it. The idea is that when you have a method, say:

Select<A, R>(IEnumerable<A> items, Func<A, R> projection)

and a call to the method, say:

Select(customers, c=>c.Name)

then we infer that you meant to call:

Select<Customer, string>(customers, c=>c.Name)

rather than making you spell it out. In that case, we would first infer that the list of customers is an IEnumerable<Customer> and therefore the type argument corresponding to A is Customer. From that we would infer that lambda parameter c is of type Customer, and therefore the result of the lambda is string, and therefore type argument corresponding to R is string. This algorithm is already complicated, but when dynamic gets involved, it gets downright weird.

The problem that the language designers faced when deciding how method type inference works with dynamic is exacerbated by our basic design goal for dynamic, that I mentioned two weeks ago: the runtime analysis of a dynamic expression honours all the information that we deduced at compile time. We only use the deduced-at-runtime types for the parts of the expression that were actually dynamic; the parts that were statically typed at compile time remain statically typed at runtime, not dynamically typed. Above we inferred R after we knew A, but what if customers had been of type dynamic? We now have a problem: depending on the runtime type of customers, type inference might succeed dynamically even though it seems like it must fail statically. But if type inference fails statically then the method is not a candidate, and, as we discussed two weeks ago, if the candidate set of a dynamically-dispatched method group is empty then overload resolution fails at compile-time, not at runtime. So it seems that type inference must succeed statically!

What a mess. How do we get out of this predicament? The spec is surprisingly short on details; it says only:

Any type argument that does not depend directly or indirectly on an argument of type dynamic is inferred using [the usual static analysis rules]. The remaining type arguments are unknown. [...] Applicability is checked according to [the usual static analysis rules] ignoring parameters whose types are unknown.[1. That last clause is a bit unclear in two ways. First, it really should say "whose types are in any way unknown". L<unknown> is considered to be an unknown type. Second, along with skipping applicability checking we also skip constraint satisfaction checking. That is, we assume that the runtime construction of L<unknown> will provide a type argument that satisfies all the necessary generic type constraints.]

So what we have here is essentially another type that spreads via a contagion model, the “unknown” type. Just as “possibly infected” is the transitive closure of the exposure relation in simplistic epidemiology, “unknown” is the transitive closure of the “depends on” relation in method type inference.

For example, if we have:

void M<T, U>(T t, L<U> items)

with a call

M(123, dyn);

Then type inference infers that T is int from the first argument. Because the second argument is of dynamic type, and the formal parameter type involves type parameter U, we “taint” U with the “unknown type”.

When a tainted type parameter is “fixed” to its final type argument, we ignore all other bounds that we have computed so far, even if some of the bounds are contradictory, and infer it to be “unknown”. So in this case, type inference would succeed and we would add M<int, unknown> to the candidate set. As noted above, we skip applicability checking for arguments that correspond to parameters whose types are in any way tainted.

But where does the transitive closure of the dependency relationship come into it? In the C# 4 and 5 compilers we did not handle this particularly well, but in Roslyn we now actually cause the taint to spread. Suppose we have:

void M<T, U, V>(T t, L<U> items, Func<T, U, V> func)

and a call

M(123, dyn, (t, u)=>u.Whatever(t));

We infer T to be int and U to be unknown. We then say that V depends on T and U, and so infer V to be unknown as well. Therefore type inference succeeds with an inference of M<int, unknown, unknown>.

The alert reader will at this point be protesting that no matter what happens with method type inference, this is going to turn into a dynamic call, and that lambdas are not legal in dynamic calls in the first place. However, we want to get as much high-quality analysis done as possible so that IntelliSense and other code analysis works correctly even in badly broken code. It is better to allow U to infect V with the “unknown taint” and have type inference succeed, as the specification indicates, than to bail out early and have type inference fail. And besides, if by some miracle we do in the future allow lambdas to be in dynamic calls, we’ll already have a sensible implementation of method type inference.


This is part two of a two-part series on dynamic contagion. Part one is here.

Out parameters and LINQ do not mix

What’s wrong with this code?

var seq = new List<string> { "1", "blah", "3" }; 
int tmp; 
var nums = 
  from item in seq   
  let success = int.TryParse(item, out tmp)   
  select success ? tmp : 0;

The intention is pretty clear: take a sequence of strings and produce a sequence of numbers by turning the elements that can be parsed as numbers into those numbers and the rest into zero.

The C# compiler will give you a definite assignment error if you try this, which seems strange. Why does it do that? Well, think about what code the compiler will translate the last statement into:

var nums =
   seq.Select(item=>new {item, success = int.TryParse(item, out tmp)})
   .Select(transparent => transparent.success ? tmp : 0);

We have two method calls and two lambdas. Clearly the first lambda assigns tmp and the second reads it, but we have no guarantee whatsoever that the first call to Select invokes the lambda! It could, for some bizarre reason of its own, never invoke the lambda. Since the compiler cannot prove that tmp is definitely assigned before it is read, this program is an error.

So is the solution then to assign tmp in the variable declaration? Certainly not! That makes the program compile, but it is a “worst practice” to mutate a variable like this. Remember, that one variable is going to be shared by every delegate invocation! In this simple LINQ-to-Objects scenario it is the case that the delegates are invoked in the sensible order, but even a small change makes this nice property go out the window:

int tmp = 0; 
var nums =
  from item in seq
  let success = int.TryParse(item, out tmp)
  orderby item
  select success ? tmp : 0; 
foreach(var num in nums) 
  Console.WriteLine(num);

Now what happens?

We make an object that represents the query. The query object consists of three steps: do the let projection, do the sort, and do the final projection. Remember, the query is not executed until the first result from it is requested; the assignment to “nums” just produces an object that represents the query, not the results.[1. As I have often said, if I could tell new LINQ users just one thing, it is that fact: query expressions produce a query, not a result set.]

Then we execute the query by entering the body of the loop. Doing so initiates a whole chain of events, but clearly it must be the case that the entire let projection is executed from start to finish over the whole sequence in order to get the resulting pairs to be sorted by the orderby clause. Executing the let projection lambda three times causes tmp to be mutated three times. Only after the sort is completed is the final projection executed, and it uses the current value of tmp, not the value that tmp was back in the distant past!

So what is the right thing to do here? The solution is to write your own extension method version of TryParse the way it would have been written had there been nullable value types available in the first place:

static int? MyTryParse(this string item) {
  int tmp;
  bool success = int.TryParse(item, out tmp);
  return success ? (int?)tmp : (int?)null; 
}

And now we can say:

var nums = 
  from item in seq 
  select item.MyTryParse() ?? 0;

The mutation of the variable is now isolated to the activation of the method, rather than a side effect that is observed by the query. Try to always avoid side effects in queries.


Thanks to Bill Wagner for the question that inspired this blog entry.


Next time on FAIC: Wackiness ensues!

What is the defining characteristic of a local variable?

If you ask a dozen C# developers what a “local variable” is, you might get a dozen different answers. A common answer is of course that a local is “a storage location on the stack”. But that is describing a local in terms of its implementation details; there is nothing in the C# language that requires that locals be stored on a data structure called “the stack”, or that there be one stack per thread. (And of course, locals are often stored in registers, and registers are not the stack.)

A less implementation-detail-laden answer might be that a local variable is a variable whose storage location is “allocated from the temporary store”. That is, a local variable is a variable whose lifetime is known to be short; the local’s lifetime ends when control leaves the code associated with the local’s declaration space.

That too, however, is a lie. The C# specification is surprisingly vague about the lifetime of an “ordinary” local variable, noting that its lifetime is only kinda-sorta that length. The jitter’s optimizer is permitted broad latitude in its determination of local lifetime; a local can be cleaned up early or late. The specification also notes that the lifetimes of some local variables are necessarily extended beyond the point where control leaves the method body containing the local declaration. Locals declared in an iterator block, for instance, live on even after control has left the iterator block; they might die only when the iterator is itself collected. Locals that are closed-over outer variables of a lambda are the same way; they live at least as long as the delegate that closes over them. And in the upcoming version of C#, locals declared in async blocks will also have extended lifetimes; when the async method returns to its caller upon encountering an “await”, the locals live on and are still there when the method resumes. (And since it might not resume on the same thread, in some bizarre situations, the locals had better not be stored on the stack!)

So if locals are not “variables on the stack” and locals are not “short lifetime variables” then what are locals?

The answer is of course staring you in the face. The defining characteristic of a local is that it can only be accessed by name in the block which declares it; it is local to a block. What makes a local truly unique is that it can only be a private implementation detail of a method body. The name of that local is never of any use to code lexically outside of the method body.

Debunking another myth about value types

Here’s another myth about value types that I sometimes hear:

“Obviously, using the new operator on a reference type allocates memory on the heap. But a value type is called a value type because it stores its own value, not a reference to its value. Therefore, using the new operator on a value type allocates no additional memory. Rather, the memory already allocated for the value is used.”

That seems plausible, right? Suppose you have an assignment to, say, a field s of type S:

s = new S(123, 456);

If S is a reference type then this allocates new memory out of the long-term garbage collected pool, a.k.a. “the heap”, and makes s refer to that storage. But if S is a value type then there is no need to allocate new storage because we already have the storage. The variable s already exists and we’re going to call the constructor on it, right?

The C# specification says no, that’s not what we do. But as we’ll see in a minute, things are a bit more complicated.

It is instructive to ask “what if the myth were true?” Suppose it were the case that the statement above meant “determine the memory location to which the constructed type is being assigned, and pass a reference to that memory location as the ‘this’ reference in the constructor”. Consider the following class defined in a single-threaded program (for the remainder of this article I am considering only single-threaded scenarios; the guarantees in multi-threaded scenarios are much weaker.)

using System;
struct S
{
  private int x;
  private int y;
  public int X { get { return x; } }
  public int Y { get { return y; } }
  public S(int x, int y, Action callback)
  {
    if (x > y)
      throw new Exception();
    callback();
    this.x = x;
    callback();
    this.y = y;
    callback();
  }
}

We have an immutable struct which throws an exception if x > y. Therefore it should be impossible to ever get an instance of S where x > y, right? That’s the point of this invariant. But watch:

static class P
{
  static void Main()
  {
    S s = default(S);
    Action callback = ()=>{Console.WriteLine("{0}, {1}", s.X, s.Y);};
    s = new S(1, 2, callback);
    s = new S(3, 4, callback);
  }
}

Again, remember that we are supposing the myth I stated above to be the truth. What happens?

  • First we make a storage location for local variable s. (Because s is an outer variable used in a lambda, this storage is on the heap. But the location of the storage for s is irrelevant to today’s myth, so let’s not consider it further.)
  • We assign a default S to s; this does not call any constructor. Rather it simply assigns zero to both x and y.
  • We make the action.
  • We (mythically) obtain a reference to s and use it for the ‘this‘ to the constructor call. The constructor calls the callback three times.
  • The first time, s is still (0, 0).
  • The second time, x has been mutated, so s is (1, 0), violating our precondition that X is not observed to be greater than Y.
  • The third time s is (1, 2).
  • Now we do it again, and again, the callback observes (1, 2), (3, 2) and (3, 4), violating the condition that X must not be observed to be greater than Y.

This is horrid. We have a perfectly sensible precondition that looks like it should never be violated because we have an immutable value type that checks its state in the constructor. And yet, in our mythical world, it is violated.

Here’s another way to demonstrate that this is mythical. Add another constructor to S:

public S(int x, int y, bool panic)
{
  if (x > y)
    throw new Exception();
  this.x = x;
  if (panic)
    throw new Exception();
  this.y = y;
}

We have

static class P
{
  static void Main()
  {
    S s = default(S);
    try
    {
      s = new S(1, 2, false);
      s = new S(3, 4, true);
    }
    catch(Exception ex)
    {
      Console.WriteLine("{0}, {1}", s.X, s.Y);};
    }
  }
}

Again, remember that we are supposing the myth I stated above to be the truth. What happens? If the storage of s is mutated by the first constructor and then partially mutated by the second constructor, then again, the catch block observes the object in an inconsistent state. Assuming the myth to be true. Which it is not. The mythical part is right here:

Therefore, using the new operator on a value type allocates no additional memory. Rather, the memory already allocated for the value is used.

That’s not true, and as we’ve just seen, if it were true then it would be possible to write some really bad code. The fact is that both statements are false. The C# specification is clear on this point:

If T is a struct type, an instance of T is created by allocating a temporary local variable

That is, the statement

s = new S(123, 456);

actually means:

  • Determine the location referred to by s.
  • Allocate a temporary variable t of type S, initialized to its default value.
  • Run the constructor, passing a reference to t for “this“.
  • Make a by-value copy of t to s.

This is as it should be. The operations happen in a predictable order: first the “new” runs, and then the “assignment” runs. In the mythical explanation, there is no assignment; it vanishes. And now the variable s is never observed to be in an inconsistent state. The only code that can observe x being greater than y is code in the constructor. Construction followed by assignment becomes “atomic”.[1. Again, I am referring to single-threaded scenarios here. If the variable s can be observed on different threads then it can be observed to be in an inconsistent state because copying any struct larger than an int is not guaranteed to be a threadsafe atomic operation.]

In the real world if you run the first version of the code above you see that s does not mutate until the constructor is done. You get (0,0) three times and then (1,2) three times. Similarly, in the second version s is observed to still be (1,2); only the temporary was mutated when the exception happened.

I said that things were more complicated than they seemed. In fact if the target of the assignment is a stack-allocated local variable (and not a field in a closure) that is declared at the same level of “try” nesting as the constructor call then we do not go through this rigamarole of making a new temporary, initializing the temporary, and copying it to the local. In that specific (and common) case we can optimize away the creation of the temporary and the copy because it is impossible for a C# program to observe the difference! But conceptually you should think of the creation as a creation-then-copy rather than a creation-in-place; that it sometimes can be in-place is an implementation detail that you should not rely upon.

Closing over the loop variable considered harmful, part two

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The for loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week’s post.

More countries really ought to implement Instant-Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form “I’d prefer to make the change, but if you can’t do that then make it a warning”. Or “don’t make the change, do make it a warning”, and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

  • Commenters who expressed support for a warning: 26
  • Commenters who expressed the sentiment “it’s better to not make the change”: 24
  • Commenters who expressed the sentiment “it’s better to make the change”: 25

Wow. I guess we’ll flip a coin. :-)[1. Mr. Smiley Face indicates that Eric is indulging in humourous japery.]

Continue reading

Closing over the loop variable considered harmful, part one

UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The for loop will not be changed. We return you now to our original article.


I don’t know why I haven’t blogged about this one before; this is the single most common incorrect bug report we get. That is, someone thinks they have found a bug in the compiler, but in fact the compiler is correct and their code is wrong. That’s a terrible situation for everyone; we very much wish to design a language which does not have “gotcha” features like this.

But I’m getting ahead of myself. Continue reading

A face made for email, part three

It has happened again: someone has put me on video talking about programming languages.

This time our fabulous C# Community Program Manager Charlie Calvert was good enough to put together a little half-hour-long video of me talking about the scenarios which justify changes to the type inference algorithm for C# 3.0.

The video is here. Enjoy!