A dynamic definite assignment puzzle, part 2

Sorry for that unexpected interlude into customer service horrors; thanks for your patience and let’s get right back to coding horrors!

So as not to bury the lede: the C# compiler is correct to flag the program from the last episode as having a definite assignment problem. Thanks to everyone who participated in the discussion in the comments.

Particular mentions go out to Jeffrey L. Whitledge who sketched the solution mere moments after I posted the problem, C# architect Neal Gafter who posted a concise demonstration, and former C# compiler dev Chris Burrows who slyly pointed out on Twitter that he “solved the puzzle”. Indeed, he solved it nine years ago when he designed and implemented this logic in the compiler.

The reasoning the compiler has to go through here is rather arcane, so let’s dig into it.

First off, let’s remind ourselves of the fundamental rule of dynamic in C#. We identify all the places in the program where the compiler deduces that the type of a thing is dynamic. Imagine that you then replaced that type judgment with the actual runtime type of the thing. The behaviour of the program with dynamic should be the same as the behaviour of the imaginary program where the compiler knew the runtime type.

This means that the compiler has to assume that the control flow of a program containing dynamic could be any control flow possible for any runtime type, including types that are legal but extremely weird. So let’s make an extremely weird type or two and see what happens.

I’ve already discussed in this blog how operator overloading works for the && operator. Briefly, the rules in C# are:

  • if left is false then left && right does not evaluate right, and has the same value as left
  • otherwise, left && right evaluates both operands and has the same value as left & right

Obviously those rules work for bool; to overload &&, we therefore need a way to evaluate whether left is false or not, and a way to determine the value of left & right. The “operator true” and “operator false” operators tell you whether a given value is logically true or logically false, and both must be provided if you provide either. Let’s look at an example, but we’ll make it a crazy example!

class Crazy
{
  public static bool operator true(Crazy c)
  {
    Console.WriteLine(“operator true”);
    return true;  // Um…
  }
  public static bool operator false(Crazy c)
  {
    Console.WriteLine(“operator false”);
    return true; // Oh dear…
  }
  public static Crazy operator &(Crazy a, Crazy b)
  {
    Console.WriteLine(“operator &”);
    return a;
  }
}

This is obviously weird and wrong; this program says that all instances of Crazy are logically true, and also all of them are logically false (because “is this false?” returns true) and also if we & two of them together, we get the first unconditionally. But, whatever, this is a legal program. If we have

Crazy c = GetCrazy() && GetAnother();

The code generated will be the moral equivalent of:

Crazy c1 = GetCrazy();
Crazy
 c = Crazy.operator_false(c1) ?
  c1 :
  Crazy.operator_&(c1, GetAnother());

But of course at runtime for this particular program we will always say that c1 is false, and always just take the first operand without calling GetAnother().

Already we’ve got a very strange program, but maybe it is not entirely clear how this produces a code path that has a definite assignment error. To get that, we need a second super-weird class. This one overloads equality — C# requires you to overload equality and inequality in pairs as well — and instead of returning a Boolean, it returns, you guessed it, something crazy:

class Weird
{
  public static Crazy operator ==(Weird p1, Weird p2)
  {
    Console.WriteLine(“==”);
    return new Crazy();
  }
  public static Crazy operator !=(Weird p1, Weird p2)
  {
    Console.WriteLine(“!=”);
    return new Crazy();
  }
  public override bool Equals(object obj) => true;
  public override int GetHashCode() => 0;
}

Maybe now you see where this is going, but we’re not quite there yet. Suppose we have

Weird obj = new Weird();
string str;
if (obj != null && GetString(out str))

What’s going to happen? Weird overrides !=, so this will call Weird.operator_!=, which returns Crazy, and now we have Crazy && bool, which is not legal, so the program will not compile. But what if we make Crazy convertible from bool? Then it will work! So add this to Crazy:

  public static implicit operator Crazy(bool b)
  {
    Console.WriteLine(“implicit conversion from bool “ + b);
    return new Crazy();
  }

Now what does if (obj != null && GetString(out str)) … mean?  Let’s turn it into method calls. This thing all together is:

Crazy c1 = Weird.operator_!=(obj, null);
bool b1 = Crazy.operator_false(c1);
// it is true that it is false
Crazy c2 = b1 ? 
  c1 : // This is the branch we take
  Crazy.operator_implicit(GetString(out str));
if (c2) …

But wait a minute, is if(c2) legal? Yes it is, because Crazy implements operator true, which is then called by the if statement, and it returns true!

We have just demonstrated that it is possible to write the program fragment

if (obj != null && GetString(out str))
  Console.WriteLine(obj + str);

and have the consequence get executed even if GetString is never called, and therefore str would be read before it was written, and therefore this program must be illegal. And indeed, if you remove the definite assignment error and run it, you get the output

!=
operator false
operator true

In the case where obj is dynamic, the compiler must assume the worst possible case, no matter how unlikely. It must assume that the dynamic value is one of these weird objects that overloads equality to return something crazy, and that the crazy thing can force the compiler to skip evaluating the right side but still result in a true value.

What is the moral of this story? There are several. The most practical moral is: do not compare any dynamic value to null! The correct way to write this code is

if ((object)obj != null && GetString(out str))

And now the problem goes away, because dynamic to object is always an identity conversion, and now the compiler can generate a null check normally, rather than going through all this rigamarole of having to see if the object overloads equality.

The moral for the programming language designer is: operator overloading is a horrible horrible feature that makes your life harder.

The moral for the compiler writer is: get good at coming up with weird and unlikely possible programs, because you’re going to have to do that a lot when you write test cases.

Thanks to Stack Overflow contributor “NH.” for the question which prompted this puzzle.

Advertisements

A dynamic definite assignment puzzle

I’ve often noted that “dynamic” in C# is just “object” with a funny hat on, but there are some subtleties to that. Here’s a little puzzle; see if you can figure it out. I’ll post the answer next time. Suppose we have this trivial little program:

class Program
{
  static void Main()
  {
    object obj = GetObject();
    string str;
    if (obj != null && GetString(out str))
      System.Console.WriteLine(obj + str);
  }
  static object GetObject() => “hello”
  static bool GetString(out string str)
  {
    str = “goodbye”;
    return true;
  }
}

There’s no problem here. C# knows that Main‘s str is definitely assigned on every code path that reads it, because we do not enter the consequence of the if statement unless GetString returned normally, and methods that take out parameters are required to write to the parameter before they return normally.

Now make a small change:

    dynamic obj = GetObject();

C# now gives error CS0165: Use of unassigned local variable ‘str’

Is the C# compiler wrong to give this error? Why or why not?

Next time on FAIC: the solution to the puzzle.

Excessive explanation, part nine

Last time we defined the grammar of our language “Exp”, which consists of identifiers, lambdas, function application and let expressions. This is a programming language. We also need a different language to formally describe types and “type schemes”. The paper says:

Note that types are absent from the language Exp. Assuming a set of
type variables α and of primitive types ι, the syntax of types τ and of
type-schemes σ is given by

τ ::= α | ι | τ → τ
σ ::= τ | ∀α σ

The high order bit here is that the language Exp is just expressions; we have no ability to declare new types in this language. So we assume for the purposes of type inference that there is an existing set of primitives — int, string, bool, whatever — that we can use as the building blocks. What those primitives are does not matter at all because the type inference algorithm never concerns itself with the difference between int and bool. The primitive types are meaningless to the type inference algorithm; all that matters is that we can tell them apart.

The first line in the grammar gives the syntax for types; a type tau can be one of three things.

  • A type can be a generic “type variable” — the T in List<T> in C#, for example — that we will notate alpha, beta, and so on.
  • A type can be one of a set of “primitive” types: int, string, whatever. Doesn’t matter.
  • The only other kind of type is “function which takes a type and returns a type”. That is, a function of one argument, with an argument type and a return type, is a type.

Just as they did with the Exp language, the authors have omitted parentheses from the language of types. Unlike with the Exp language, they neglected to call out that they were doing so.

The second line in the grammar gives the syntax for type schemes.

What’s a type scheme? Recall earlier that we said that a type scheme is a bit like a generic type. More formally, a type scheme is either just a plain type (including possibly a type variable), or “for any type alpha there exists a type such that…”

A type-scheme ∀α1...∀αn τ (which we may write ∀α1...αn τ) has generic
type variables α1...αn.

I’ve never liked “type variables” — I think variables should, you know, vary. The C# spec calls these type parameters, which is much more clear; a parameter is given a value by substituting an argument for it. But that’s the next episode!

Anyways, this sentence is just pointing out that for notational convenience we’ll omit the universal quantifiers in the cases where we have a large number of type variables in a row. But we will always have the quantifiers on the left hand side of the schema. A quantifier never appears in a type, only in a type schema.

Something important to realize here is that types and type schemes are just another syntax for a language, like “Exp”, our expression language. A “type” in this conception is simply a string of symbols that matches the grammar of our type language.

An interesting point to call out here: note that this type system has no generic “classes”. There is no “list of T” in this system. We’ve got primitive types, we’ve got function types, and we’ve got generic functions, and that’s it.

Again, this is because we don’t need anything more complicated than generic functions in order to make type inference interesting. If we can solve the problem of inference in a world with generic functions, we can very easily extend the inference to generic types. So let’s keep it simple and not introduce generic types into our core language.

A monotype μ is a type containing no type variables.

This just introduces a new jargon term, to distinguish generic types from non-generic types.

And that’s it for section two; that was a lot shorter than the introduction.

Next time: as promised, type substitution.

Excessive explanation, part two

We continue with my excessively detailed explanation of the seminal paper on the ML type inference system…

1 Introduction

This paper is concerned with the polymorphic type discipline of ML, 
which is a general purpose functional programming language, although 
it was first introduced as a metalanguage (whence its name) for 
constructing proofs in the LCF proof system. [4]

These bracketed numbers are not footnotes; they are references to other papers. See the original paper’s list of references if you want to follow up on these references.


By “type discipline” we basically mean the same thing as “type system”. That is, there is some way of determining the “type” of the expressions in a programming language, and detecting when the programs are erroneous due to violations of the typing rules.

By “polymorphic” here we mean what a C# programmer would mean by “generic”. There are many kinds of polymorphism in computer programming. “Subtype polymorphism” is what object-oriented programmers typically think of when someone says “polymorphism”:

void M(Animal a) { ... }
...
Giraffe g = new Giraffe();
M(g); // No problem.

That’s not the kind of polymorphism we’re talking about here. Rather, we’re talking about:

void N<T>(List<T> list) { ... }
...
List<int> g = ...;
N(g); // No problem.

This is often called “parametric polymorphism”, because the method takes a “type parameter”.

ML has parametric polymorphism, not subtype polymorphism.


A metalanguage is a language used to implement or describe another language. In this case, ML was first created to be the metalanguage for LCF, “Logic for Computable Functions”. The purpose of LCF was to automatically find proofs for mathematical theorems; you could write little programs in ML that described to the LCF system various strategies for trying to get a proof for a particular theorem from a set of premises. But as the paper notes, ML and its descendants are now general-purpose programming languages in their own right, not just implementation details of another language.


The type discipline was studied in [5] where it was shown to be 
semantically sound, in a sense made precise below, but where one 
important question was left open: does the type-checking algorithm — 
or more precisely the type assignment algorithm (since types 
are assigned by the compiler, and need not be mentioned by the 
programmer) — find the most general type possible for every expression 
and declaration?

As the paper notes, we’ll more formally define “sound” later. But the basic idea of soundness comes from logic. A logical deduction is valid if every conclusion follows logically from a premise. But an argument can be valid and come to a false conclusion. For example “All men are immortal; Socrates is a man; therefore Socrates is immortal” is valid, but not sound. The conclusion follows logically, but it follows logically from an incorrect premise. And of course an invalid argument can still reach a true conclusion that does not follow from the premises. A valid deduction with all true premises is sound.

Type systems are essentially systems of logical deduction. We would like to know that if a deduction is made about the type of an expression on the basis of some premises and logical rules, that we can rely on the soundness of those deductions.


The paper draws a bit of a hair-splitting distinction between a type checking algorithm and a type assignment algorithm. A type checking algorithm verifies that a program does not violate any of the rules of the type system, but does not say whether the types were added by the programmer as annotations, or whether the compiler deduced them. In ML, all types are deduced by the compiler using a type assignment algorithm. The question is whether the type assignment algorithm that the authors have in mind finds the most general type for expressions and function declarations.

By “most general” we mean that we want to avoid situations where, say, the compiler deduces “oh, this is a method that takes a list of integers and returns a list of integers”, when it should be deducing “this is a method that takes a list of T and returns a list of T for any T”. The latter is more general.

Why is this important? Well, suppose we deduce that a method takes a list of int when in fact it would be just as correct to say that it takes a list of T. If we deduce the former then a program which passes a list of strings is an error; if we deduce the latter then it is legal. We would like to make deductions that are not just sound, but also that allow the greatest possible number of correct programs to get the stamp of approval from the type system.


Here we answer the question in the affirmative, for the purely 
applicative part of ML. It follows immediately that it is decidable 
whether a program is well-typed, in contrast with the elegant and 
slightly more permissive type discipline of Coppo. [1]

This is a complicated one; we’ll deal with this paragraph in the next episode!

Functional programming for beginners

Happy New Year everyone; I hope you had a pleasant and relaxing festive holiday season. I sure did.

I’m starting the new year off by giving a short — an hour long or so — talk on how you can use ideas from functional programming languages in C#. It will broadcast this coming Wednesday, the 13th of January, at 10AM Pacific, 1PM Eastern.

The talk will be aimed straight at beginners who have some experience with OOP style in C# but no experience with functional languages. I’ll cover ways to avoid mutating variables and data structures, passing functions as data, how LINQ is functional, and some speculation on future features for C# 7.

There may also be funny pictures downloaded from the internet.

Probably it will be too basic for the majority of readers of this blog, but perhaps you have a friend or colleague interested in this topic.

Thanks once again to the nice people at InformIT who are sponsoring this talk. You can get all the details at their page:

http://www.informit.com/promotions/webcast-a-beginners-guide-to-functional-programming-141067

In related news, I am putting together what will be a very, very long indeed series of blog articles on functional programming, but not in C#. Comme c’est bizarre!

Foreshadowing: your sign of a quality blog. I’ll start posting… soon.

The dedoublifier, part four

I said last time that binary-searching the rationals (WOLOG between zero and one) for a particular fraction that is very close to a given double does not really work, because we end up with only fractions that have powers of two in the denominators. We already know what fraction with a power of two in the denominator is closest to our given double; itself!

But there is a way to binary-search the rationals between zero and one, as it turns out. Before we get into it, a quick refresher on how binary search works:
Continue reading