Does not compute

One of the most basic ways to think about a computer program is that it is a device which takes in integers as inputs and spits out integers as outputs. The C# compiler, for example, takes in source code strings, and those source code strings are essentially nothing more than enormous binary numbers. The output of the compiler is either diagnostic text, or strings of IL and metadata, which are also just enormous binary numbers.  Because the compiler is not perfect, in some rare cases it terminates abnormally with an internal error message. But those fatal error messages are also just big binary numbers. So let’s take this as our basic model of a computer program: a computer program is a device that either (1) runs forever without producing output, or (2) computes a function that maps one integer to another.
Continue reading

Should C# warn on null dereference?

As you probably know, the C# compiler does flow analysis on constants for the purposes of finding unreachable code. In this method the statement with the calls is known to be unreachable, and the compiler warns about it.

const object x = null; 
void Foo() 
  if (x != null)

Now suppose we removed the if statement and just had the call:

const object x = null; 
void Foo() 

The compiler does not warn you that you’re dereferencing null! The question is, as usual, why not?
Continue reading

Never say never, part two

This is part two of a two-part series about determining whether the endpoint of a method is never reachable. Part one is here. A follow-up article is here.

Whether we have a “never” return type or not, we need to be able to determine when the end point of a method is unreachable for error reporting in methods that have non-void return type. The compiler is pretty clever about working that out; it can handle situations like

int M()
    while(true) N();
  catch(Exception ex)
    throw new WrappingException(ex);

The compiler knows that N either throws or it doesn’t, and that if it doesn’t, then the try block never exits, and if it does, then either the construction of the exception throws, or the construction succeeds and the catch throws the new exception. No matter what, the end point of M is never reached.

However, the compiler is not infinitely clever. It is easy to fool it:
Continue reading

Covariance and Contravariance, Part 11: To infinity, but not beyond

As I’ve discussed at length in this space, we are considering adding covariance and contravariance on delegate and interface types parameterized with reference types to a hypothetical future version of C#. (See my earlier articles in this series for an explanation of the proposed feature.)

Variance is highly useful in mainstream cases, but exposes a really irksome problem in certain bizarre corner cases. Here’s just one example.

Consider a “normal” interface contravariant in its sole generic type parameter, and a “crazy” invariant interface which extends the normal interface in a strange way:

public interface IN<in U> {}
public interface IC<X> : IN<IN<IC<IC<X>>>> {}

This is a bit weird, but certainly legal.

Before I go on, I want to digress and talk a bit about why this is legal. Most people when they first see such a beast immediately say “but an interface cannot inherit from itself, that’s an illegal circularity in the inheritance chain!”

First off, no, that is not correct. Nowhere does the C# specification make this kind of inheritance illegal, and in fact, a weak form of it must be legal. You want to be able to say:

interface INumber<X> : IComparable<INumber<X>> { ... }

that is, you want to be able to express that one of the guarantees of the INumber<X> contract is that you can always compare one number with another one. Therefore, it must be legal to use a type’s name in a type argument of a generic base type.

However, all is not rosy. This particularly gross kind of inheritance that I give as an example is in fact illegal in the CLR, even though it is not illegal in C#. This means that it is possible to have the C# compiler generate an interface type which then cannot be loaded by the CLR. This unfortunate mismatch is troubling, and I hope in a future version of C# to make the type definition rules of C# as strict or stricter than those of the CLR. Until then, if it hurts when you do that, don’t do it.

Update from 2020: I never did. I left Microsoft in 2012 and I do not know if anyone else ever did either.

Second, unfortunately, the C# compiler presently has numerous bugs in its cycle detector such that sometimes things which kinda look like cycles but are in fact not cycles are flagged as cycle errors. This just makes it all the more difficult for people to understand what is a legal cycle and what isn’t. For example, the compiler today will incorrectly report that this is an illegal base class cycle, even though it clearly is not:

public class November<T> {}
public class Romeo : November<Romeo.Sierra.Tango> {
   public class Sierra {
       public class Tango {}

I have devised a new (and I believe correct!) cycle detection algorithm implementation, but unfortunately it will not make it into the service release of the C# 3 compiler. It will have to wait for a hypothetical future release. I hope to address the problem of bringing the legal type checker into line with the CLR at the same time.

Update from 2020: The implementation I wrote for the C# 3 service release was judged too large and risky a change. I seem to recall that I eventually added it to C# 4 but I am hazy on the details. Certainly none of these problems are in Roslyn, which has a completely different algorithm for analyzing base types.

Anyway, back to the subject at hand: crazy variance. We have the interfaces defined as above, and then give the compiler a little puzzle to solve:

IC<double> bar = whatever;
IN<IC<string>> foo = bar;  // Is this assignment legal?

I am about to get into a morass of impossible-to-read generic names, so to make it easier on all of us, I am going to from now on abbreviate IN<IC<string>> as NCSIC<double> will be abbreviated as CD. You get the idea I’m sure.

Similarly, I will notate “is convertible to by implicit reference conversion” by a right-pointing arrow. So the question at hand is true or false: CD→NCS ?

Well, let’s see. Clearly CD does not go to NCS directly. But (the compiler reasons) maybe CD’s base type does.

CD’s base type is NNCCD. Does NNCCD→NCS? Well, N is contravariant in its parameter so therefore this boils down to the question, does CS→NCCD ?

Clearly not directly. But perhaps CS has a base type which goes to NCCD. The base type of CS is NNCCS. So now we have the question does NNCCS→NCCD ?

Well, N is contravariant in its parameter, so this boils down to the question does CCD→NCCS ?

Let’s pause and reflect a moment here.

The compiler has “reduced” the problem of determining the truth of CD→NCS to the problem of determining the truth of CCD→NCCS! If we keep on “reducing” like this then we’ll get to CCCD→NCCCS, CCCCD→NCCCCS, and so on.

I have a prototype C# compiler which implements variance – if you try this, it says “fatal error, an expression is too complex to compile”.

I considered implementing an algorithm that is smarter about determining convertibility; the paper I reference below has such an algorithm. (Fortunately, the C# type system is weak enough that determining convertibility of complex types is NOT equivalent to the halting problem; we can find these bogus situations both in principle and in practice. Interestingly, there are type systems in which this problem is equivalent to the halting problem, and type systems for which the computability of convertibility is still an open question.) However, given that we have many other higher priorities, it’s easier to just let the compiler run out of stack space and have a fatal error. These are not realistic scenarios from which we must sensibly recover.

This is just a taste of some of the ways that the type system gets weird. To get a far more in-depth treatment of this subject, you should read this excellent Microsoft Research paper.

Update from 2020:

Though I implemented a prototype of Andrew Kennedy’s circular reference detector, I did not ever get it into any version of C#. We just kept saying “expression too complex” when the compiler ran out of stack. I seem to recall that it was eventually added to Roslyn but I haven’t checked the sources to be sure.

The question that Andrew raises in the research paper — are languages like C# 4 and Java that have nominal subtyping and contravariant conversions decidable? — has been answered; they are not. I am very pleased to have played an extremely minor role in that. It went down like this.

A “decision problem” is a problem with a “yes or no” answer. “Is this string a substring of that string?” or “Is there a cycle on this graph that visits every node exactly once?”  A problem is said to be “decidable” if it is possible to write a computer program that always answers the question in finite time, in theory. The problems I just gave are decidable. “Does this program eventually halt when given this input or does it infinite loop?” is famously undecidable. There are many problems whose decidability we do not yet know, but the standard technique is to show a way to transform one problem into another whose decidability is known.

This 2013 post asks for a collection of decision problems not known to be decidable or undecidable, and I answered that the question “is this type convertible to that type?” in a language with nominal subtyping and contravariant conversions was not known to be decidable or undecidable. A reader noticed that post, solved the problem, and wrote a short paper describing the proof. It turns out you can make the compiler simulate an arbitrary Turing machine encoded it in the type system, and therefore, convertibility is the same class of problems as the halting problem mentioned above: it is undecidable.

This post generated a number of reader responses back in 2008; “expression too complex is an understatement” summed up a lot of them.

Some readers did experiments and discovered that small variations on my examples could cause Visual Studio to crash; at the time, the IDE used a different type analyzer than the compiler, and there were a lot of weird bugs like that. Eliminating the duplicated analysis was a major focus for our work in C# 3. We had to add so many new analysis algorithms to make LINQ work, and we did not want to replicate them in the IDE. So we not only had a lot of feature work to do, but also a lot of architecture work to de-silo the compiler/IDE divide. Fun times.

Many people also took this opportunity to request return type covariance, which after only twenty years of it being requested, may actually make it into C# 9.