Covariance and Contravariance, Part 11: To infinity, but not beyond

As I’ve discussed at length in this space, we are considering adding covariance and contravariance on delegate and interface types parameterized with reference types to a hypothetical future version of C#. (See my earlier articles in this series for an explanation of the proposed feature.)

Variance is highly useful in mainstream cases, but exposes a really irksome problem in certain bizarre corner cases. Here’s just one example.

Consider a “normal” interface contravariant in its sole generic type parameter, and a “crazy” invariant interface which extends the normal interface in a strange way:

public interface IN<in U> {}
public interface IC<X> : IN<IN<IC<IC<X>>>> {}

This is a bit weird, but certainly legal.

Before I go on, I want to digress and talk a bit about why this is legal. Most people when they first see such a beast immediately say “but an interface cannot inherit from itself, that’s an illegal circularity in the inheritance chain!”

First off, no, that is not correct. Nowhere does the C# specification make this kind of inheritance illegal, and in fact, a weak form of it must be legal. You want to be able to say:

interface INumber<X> : IComparable<INumber<X>> { ... }

that is, you want to be able to express that one of the guarantees of the INumber<X> contract is that you can always compare one number with another one. Therefore, it must be legal to use a type’s name in a type argument of a generic base type.

However, all is not rosy. This particularly gross kind of inheritance that I give as an example is in fact illegal in the CLR, even though it is not illegal in C#. This means that it is possible to have the C# compiler generate an interface type which then cannot be loaded by the CLR. This unfortunate mismatch is troubling, and I hope in a future version of C# to make the type definition rules of C# as strict or stricter than those of the CLR. Until then, if it hurts when you do that, don’t do it.


Update from 2020: I never did. I left Microsoft in 2012 and I do not know if anyone else ever did either.


Second, unfortunately, the C# compiler presently has numerous bugs in its cycle detector such that sometimes things which kinda look like cycles but are in fact not cycles are flagged as cycle errors. This just makes it all the more difficult for people to understand what is a legal cycle and what isn’t. For example, the compiler today will incorrectly report that this is an illegal base class cycle, even though it clearly is not:

public class November<T> {}
public class Romeo : November<Romeo.Sierra.Tango> {
   public class Sierra {
       public class Tango {}
    }
}

I have devised a new (and I believe correct!) cycle detection algorithm implementation, but unfortunately it will not make it into the service release of the C# 3 compiler. It will have to wait for a hypothetical future release. I hope to address the problem of bringing the legal type checker into line with the CLR at the same time.


Update from 2020: The implementation I wrote for the C# 3 service release was judged too large and risky a change. I seem to recall that I eventually added it to C# 4 but I am hazy on the details. Certainly none of these problems are in Roslyn, which has a completely different algorithm for analyzing base types.


Anyway, back to the subject at hand: crazy variance. We have the interfaces defined as above, and then give the compiler a little puzzle to solve:

IC<double> bar = whatever;
IN<IC<string>> foo = bar;  // Is this assignment legal?

I am about to get into a morass of impossible-to-read generic names, so to make it easier on all of us, I am going to from now on abbreviate IN<IC<string>> as NCSIC<double> will be abbreviated as CD. You get the idea I’m sure.

Similarly, I will notate “is convertible to by implicit reference conversion” by a right-pointing arrow. So the question at hand is true or false: CD→NCS ?

Well, let’s see. Clearly CD does not go to NCS directly. But (the compiler reasons) maybe CD’s base type does.

CD’s base type is NNCCD. Does NNCCD→NCS? Well, N is contravariant in its parameter so therefore this boils down to the question, does CS→NCCD ?

Clearly not directly. But perhaps CS has a base type which goes to NCCD. The base type of CS is NNCCS. So now we have the question does NNCCS→NCCD ?

Well, N is contravariant in its parameter, so this boils down to the question does CCD→NCCS ?

Let’s pause and reflect a moment here.

The compiler has “reduced” the problem of determining the truth of CD→NCS to the problem of determining the truth of CCD→NCCS! If we keep on “reducing” like this then we’ll get to CCCD→NCCCS, CCCCD→NCCCCS, and so on.

I have a prototype C# compiler which implements variance – if you try this, it says “fatal error, an expression is too complex to compile”.

I considered implementing an algorithm that is smarter about determining convertibility; the paper I reference below has such an algorithm. (Fortunately, the C# type system is weak enough that determining convertibility of complex types is NOT equivalent to the halting problem; we can find these bogus situations both in principle and in practice. Interestingly, there are type systems in which this problem is equivalent to the halting problem, and type systems for which the computability of convertibility is still an open question.) However, given that we have many other higher priorities, it’s easier to just let the compiler run out of stack space and have a fatal error. These are not realistic scenarios from which we must sensibly recover.

This is just a taste of some of the ways that the type system gets weird. To get a far more in-depth treatment of this subject, you should read this excellent Microsoft Research paper.


Update from 2020:

Though I implemented a prototype of Andrew Kennedy’s circular reference detector, I did not ever get it into any version of C#. We just kept saying “expression too complex” when the compiler ran out of stack. I seem to recall that it was eventually added to Roslyn but I haven’t checked the sources to be sure.

The question that Andrew raises in the research paper — are languages like C# 4 and Java that have nominal subtyping and contravariant conversions decidable? — has been answered; they are not. I am very pleased to have played an extremely minor role in that. It went down like this.

A “decision problem” is a problem with a “yes or no” answer. “Is this string a substring of that string?” or “Is there a cycle on this graph that visits every node exactly once?”  A problem is said to be “decidable” if it is possible to write a computer program that always answers the question in finite time, in theory. The problems I just gave are decidable. “Does this program eventually halt when given this input or does it infinite loop?” is famously undecidable. There are many problems whose decidability we do not yet know, but the standard technique is to show a way to transform one problem into another whose decidability is known.

This 2013 post asks for a collection of decision problems not known to be decidable or undecidable, and I answered that the question “is this type convertible to that type?” in a language with nominal subtyping and contravariant conversions was not known to be decidable or undecidable. A reader noticed that post, solved the problem, and wrote a short paper describing the proof. It turns out you can make the compiler simulate an arbitrary Turing machine encoded it in the type system, and therefore, convertibility is the same class of problems as the halting problem mentioned above: it is undecidable.


This post generated a number of reader responses back in 2008; “expression too complex is an understatement” summed up a lot of them.

Some readers did experiments and discovered that small variations on my examples could cause Visual Studio to crash; at the time, the IDE used a different type analyzer than the compiler, and there were a lot of weird bugs like that. Eliminating the duplicated analysis was a major focus for our work in C# 3. We had to add so many new analysis algorithms to make LINQ work, and we did not want to replicate them in the IDE. So we not only had a lot of feature work to do, but also a lot of architecture work to de-silo the compiler/IDE divide. Fun times.

Many people also took this opportunity to request return type covariance, which after only twenty years of it being requested, may actually make it into C# 9.

 

4 thoughts on “Covariance and Contravariance, Part 11: To infinity, but not beyond

  1. Interesting stuff, thanks! And especially for caring to make the 2020 additions!

    I know you’re reposting stuff that mysteriously disappeared from Microsoft’s site. This looks like the 11th and the last part, but it seems to have used to be the 12th, as evidenced by links here and there (e.g. https://stackoverflow.com/questions/14177225/why-does-field-declaration-with-duplicated-nested-type-in-generic-class-results#comment19652413_14177225 – your own comment). Is there one you decided to drop and shorten the republished series?

  2. Pingback: Why does this generic constraint compile when it seems to have a circular reference – Make Me Engineer

  3. Pingback: Is C# type system sound and decidable? – Row Coding

Leave a comment