A dynamic definite assignment puzzle, part 2

Sorry for that unexpected interlude into customer service horrors; thanks for your patience and let’s get right back to coding horrors!

So as not to bury the lede: the C# compiler is correct to flag the program from the last episode as having a definite assignment problem. Thanks to everyone who participated in the discussion in the comments.

Particular mentions go out to Jeffrey L. Whitledge who sketched the solution mere moments after I posted the problem, C# architect Neal Gafter who posted a concise demonstration, and former C# compiler dev Chris Burrows who slyly pointed out on Twitter that he “solved the puzzle”. Indeed, he solved it nine years ago when he designed and implemented this logic in the compiler.

The reasoning the compiler has to go through here is rather arcane, so let’s dig into it.

First off, let’s remind ourselves of the fundamental rule of dynamic in C#. We identify all the places in the program where the compiler deduces that the type of a thing is dynamic. Imagine that you then replaced that type judgment with the actual runtime type of the thing. The behaviour of the program with dynamic should be the same as the behaviour of the imaginary program where the compiler knew the runtime type.

This means that the compiler has to assume that the control flow of a program containing dynamic could be any control flow possible for any runtime type, including types that are legal but extremely weird. So let’s make an extremely weird type or two and see what happens.

I’ve already discussed in this blog how operator overloading works for the && operator. Briefly, the rules in C# are:

  • if left is false then left && right does not evaluate right, and has the same value as left
  • otherwise, left && right evaluates both operands and has the same value as left & right

Obviously those rules work for bool; to overload &&, we therefore need a way to evaluate whether left is false or not, and a way to determine the value of left & right. The “operator true” and “operator false” operators tell you whether a given value is logically true or logically false, and both must be provided if you provide either. Let’s look at an example, but we’ll make it a crazy example!

class Crazy
  public static bool operator true(Crazy c)
    Console.WriteLine(“operator true”);
    return true;  // Um…
  public static bool operator false(Crazy c)
    Console.WriteLine(“operator false”);
    return true; // Oh dear…
  public static Crazy operator &(Crazy a, Crazy b)
    Console.WriteLine(“operator &”);
    return a;

This is obviously weird and wrong; this program says that all instances of Crazy are logically true, and also all of them are logically false (because “is this false?” returns true) and also if we & two of them together, we get the first unconditionally. But, whatever, this is a legal program. If we have

Crazy c = GetCrazy() && GetAnother();

The code generated will be the moral equivalent of:

Crazy c1 = GetCrazy();
 c = Crazy.operator_false(c1) ?
  c1 :
  Crazy.operator_&(c1, GetAnother());

But of course at runtime for this particular program we will always say that c1 is false, and always just take the first operand without calling GetAnother().

Already we’ve got a very strange program, but maybe it is not entirely clear how this produces a code path that has a definite assignment error. To get that, we need a second super-weird class. This one overloads equality — C# requires you to overload equality and inequality in pairs as well — and instead of returning a Boolean, it returns, you guessed it, something crazy:

class Weird
  public static Crazy operator ==(Weird p1, Weird p2)
    return new Crazy();
  public static Crazy operator !=(Weird p1, Weird p2)
    return new Crazy();
  public override bool Equals(object obj) => true;
  public override int GetHashCode() => 0;

Maybe now you see where this is going, but we’re not quite there yet. Suppose we have

Weird obj = new Weird();
string str;
if (obj != null && GetString(out str))

What’s going to happen? Weird overrides !=, so this will call Weird.operator_!=, which returns Crazy, and now we have Crazy && bool, which is not legal, so the program will not compile. But what if we make Crazy convertible from bool? Then it will work! So add this to Crazy:

  public static implicit operator Crazy(bool b)
    Console.WriteLine(“implicit conversion from bool “ + b);
    return new Crazy();

Now what does if (obj != null && GetString(out str)) … mean?  Let’s turn it into method calls. This thing all together is:

Crazy c1 = Weird.operator_!=(obj, null);
bool b1 = Crazy.operator_false(c1);
// it is true that it is false
Crazy c2 = b1 ? 
  c1 : // This is the branch we take
  Crazy.operator_implicit(GetString(out str));
if (c2) …

But wait a minute, is if(c2) legal? Yes it is, because Crazy implements operator true, which is then called by the if statement, and it returns true!

We have just demonstrated that it is possible to write the program fragment

if (obj != null && GetString(out str))
  Console.WriteLine(obj + str);

and have the consequence get executed even if GetString is never called, and therefore str would be read before it was written, and therefore this program must be illegal. And indeed, if you remove the definite assignment error and run it, you get the output

operator false
operator true

In the case where obj is dynamic, the compiler must assume the worst possible case, no matter how unlikely. It must assume that the dynamic value is one of these weird objects that overloads equality to return something crazy, and that the crazy thing can force the compiler to skip evaluating the right side but still result in a true value.

What is the moral of this story? There are several. The most practical moral is: do not compare any dynamic value to null! The correct way to write this code is

if ((object)obj != null && GetString(out str))

And now the problem goes away, because dynamic to object is always an identity conversion, and now the compiler can generate a null check normally, rather than going through all this rigamarole of having to see if the object overloads equality.

The moral for the programming language designer is: operator overloading is a horrible horrible feature that makes your life harder.

The moral for the compiler writer is: get good at coming up with weird and unlikely possible programs, because you’re going to have to do that a lot when you write test cases.

Thanks to Stack Overflow contributor “NH.” for the question which prompted this puzzle.

A dynamic definite assignment puzzle

I’ve often noted that “dynamic” in C# is just “object” with a funny hat on, but there are some subtleties to that. Here’s a little puzzle; see if you can figure it out. I’ll post the answer next time. Suppose we have this trivial little program:

class Program
  static void Main()
    object obj = GetObject();
    string str;
    if (obj != null && GetString(out str))
      System.Console.WriteLine(obj + str);
  static object GetObject() => “hello”
  static bool GetString(out string str)
    str = “goodbye”;
    return true;

There’s no problem here. C# knows that Main‘s str is definitely assigned on every code path that reads it, because we do not enter the consequence of the if statement unless GetString returned normally, and methods that take out parameters are required to write to the parameter before they return normally.

Now make a small change:

    dynamic obj = GetObject();

C# now gives error CS0165: Use of unassigned local variable ‘str’

Is the C# compiler wrong to give this error? Why or why not?

Next time on FAIC: the solution to the puzzle.

Casts and type parameters do not mix

Here’s a question I’m asked occasionally:

void M<T>(T t) where T : Animal
  // This gives a compile-time error:
  if (t is Dog)
  // But this does not:
  if (t is Dog)
    (t as Dog).Bark();

What’s going on here? Why is the cast operator rejected? The reason illustrates yet again that the cast operator is super weird.
Continue reading

Wizards and warriors, part four

Last time we saw that in order to decide what code to call based on the runtime type of one argument — single dispatch — we could use virtual dispatch. And we saw that we could use the inaptly-named Visitor Pattern to emulate double dispatch by doing a series of virtual and non-virtual dispatches. This works but it has some drawbacks. It’s heavyweight, the pattern is difficult to understand, and it doesn’t extend easily to true multiple dispatch.

I said last time that C# does not support double dispatch. That was a baldfaced lie! In fact C# supports multiple dispatch; you can dispatch a method call based on the runtime types of arbitrarily many arguments. Here, let’s dispatch based on the runtime types of two arguments:
Continue reading

What is “duck typing”?

Seriously, what is it? It’s not a rhetorical question. I realized this morning that I am totally confused about this.

First off, let me say what I thought “duck typing” was. I thought it was a form of typing.

So what is “typing”? We’ve discussed this before on this blog. (And you might want to check out this post on late binding and this post on strong typing.) To sum up:

Continue reading

Dynamic contagion, part two

This is part two of a two-part series on dynamic contagion. Part one is here.

Last time I discussed how the dynamic type tends to spread through a program like a virus: if an expression of dynamic type “touches” another expression then that other expression often also becomes of dynamic type. Today I want to describe one of the least well understood aspects of method type inference, which also uses a contagion model when dynamic gets involved. Continue reading

Dynamic contagion, part one

This is part one of a two-part series on dynamic contagion. Part two is here.

Suppose you’re an epidemiologist modeling the potential spread of a highly infectious disease. The straightforward way to model such a series of unfortunate events is to assume that the population can be divided into three sets: the definitely infected, the definitely healthy, and the possibly infected. If a member of the healthy population encounters a member of the definitely infected or possibly infected population, then they become a member of the possibly infected population. (Or, put another way, the possibly infected population is closed transitively over the exposure relation.) A member of the possibly infected population becomes classified as either definitely healthy or definitely infected when they undergo some sort of test. And an infected person can become a healthy person by being cured.

This sort of contagion model is fairly common in the design of computer systems. For example, suppose you have a web site that takes in strings from users, stores them in a database, and serves them up to other users. Like, say, this blog, which takes in comments from you, stores them in a database, and then serves them right back up to other users. That’s a Cross Site Scripting (XSS) attack waiting to happen right there. A common way to mitigate the XSS problem is to use data tainting, which uses the contagion model to identify strings that are possibly hostile. Whenever you do anything to a potentially-hostile string, like, say, concatenate it with a non-hostile string, the result is a possibly-hostile string. If the string is determined via some test to be benign, or can have its potentially hostile parts stripped out, then it becomes safe.

The “dynamic” feature in C# 4 and above has a lot in common with these sorts of contagion models. As I pointed out last time, when an argument of a call is dynamic then odds are pretty good that the compiler will classify the result of the call as dynamic as well; the taint spreads. In fact, when you use almost any operator on a dynamic expression, the result is of dynamic type, with a few exceptions. (“is” for example always returns a bool.)  You can “cure” an expression to prevent it spreading dynamicism by casting it to object, or to whatever other non-dynamic type you’d like; casting dynamic to object is an identity conversion.

The way that dynamic is contagious is an emergent phenomenon of the rules for working out the types of expressions in C#. There is, however, one place where we explicitly use a contagion model inside the compiler in order to correctly work out the type of an expression that involves dynamic types: it is one of the most arcane aspects of method type inference. Next time I’ll give you all the rundown on that.

This is part one of a two-part series on dynamic contagion. Part two is here.

A method group of one

I’m implementing the semantic analysis of dynamic expressions in Roslyn this week, so I’m fielding a lot of questions within the team on the design of the dynamic feature of C# 4. A question I get fairly frequently in this space is as follows:

public class Alpha
  public int Foo(string x) { ... }
  dynamic d = whatever;
  Alpha alpha = MakeAlpha();
  var result = alpha.Foo(d);

How is this analyzed? More specifically, what’s the type of local result?

If the receiver (that is, alpha) of the call were of type dynamic then there would be little we could do at compile time. We’d analyze the compile-time types of the arguments and emit a dynamic call site that caused the semantic analysis to be performed at runtime, using the runtime type of the dynamic expression. But that’s not the case here. We know at compile time what the type of the receiver is. One of the design principles of the C# dynamic feature is that if we have a type that is known at compile time, then at runtime the type analysis honours that. In other words, we only use the runtime type of the things that were actually dynamic; everything else we use the compile-time type. If MakeAlpha() returns a class derived from Alpha, and that derived class has more overloads of Foo, we don’t care.

Because we know that we’re going to be doing overload resolution on a method called Foo on an instance of type Alpha, we can do a “sanity check” at compile time to determine if we know that for sure, this is going to fail at runtime. So we do overload resolution, but instead of doing the full overload resolution algorithm (eliminate inapplicable candidates, determine the unique best applicable candidate, perform final validation of that candidate), we do a partial overload resolution algorithm. We get as far as eliminating the inapplicable candidates, and if that leaves one or more candidates then the call is bound dynamically. If it leaves zero candidates then we report an error at compile time, because we know that nothing is going to work at runtime.

Now, a seemingly reasonable question to ask at this point is: overload resolution in this case could determine that there is exactly one applicable candidate in the method group, and therefore we can determine statically that the type of result is int, so why do we instead say that the type of result is dynamic?

That appears to be a reasonable question, but think about it a bit more. If you and I and the compiler know that overload resolution is going to choose a particular method then why are we making a dynamic call in the first place? Why haven’t we cast d to string? This situation is rare, unlikely, and has an easy workaround by inserting casts appropriately (either casting the call expression to int or the argument to string). Situations that are rare, unlikely and easily worked around are poor candidates for compiler optimizations. You asked for a dynamic call, so you’re going to get a dynamic call.

That’s reason enough to not do the proposed feature, but let’s think about it a bit more deeply by exploring a variation on this scenario that I glossed over above. Eta Corporation produces:

public class Eta {}

and Zeta Corporation extends this code:

public class Zeta : Eta
  public int Foo(string x){ ... }
  dynamic d = whatever;
  Zeta zeta = new Zeta();
  var result = zeta.Foo(d);

Suppose we say that the type of result is int because the method group has only one member. Now suppose that in the next version, Eta Corporation supplies a new method:

public class Eta
  public string Foo(double x){...}

Zeta corporation recompiles their code, and hey presto, suddenly result is of type dynamic! Why should Eta Corporation’s change to the base class cause the semantic analysis of code that uses a derived class to change? This seems unexpected. C# has been carefully designed to avoid these sorts of “Brittle Base Class” failures; see my other articles on that subject for examples of how we do that.

We can make a bad situation even worse. Suppose Eta’s change is instead:

public class Eta
  protected string Foo(double x){...}

Now what happens? Should we say that the type of result is int when the code appears outside of class Zeta, because overload resolution produces a single applicable candidate, but dynamic when it appears inside, because overload resolution produces two such candidates? That would be quite bizarre indeed.

The proposal is simply too much cleverness in pursuit of too little value. We’ve been asked to perform a dynamic binding, and so we’re going to perform a dynamic binding; the result should in turn be of type dynamic. The benefits of being able to statically deduce types of dynamic expressions does not pay for the costs, so we don’t attempt to do so. If you want static analysis then don’t turn it off in the first place.

Next time on FAIC: The dynamic taint of method type inference.

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T (or Nullable<T>.) Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading