A contravariance conundrum

Suppose we have my usual hierarchy of types, Animal, Giraffe, etc, with the obvious type relationships. An IEqualityComparer<T> is contravariant in its type parameter; if we have a device which can compare two Animals for equality then it can compare two Giraffes for equality as well. So why does this code fail to compile?

IEqualityComparer<Animal> animalComparer = whatever;
IEnumerable<Giraffe> giraffes = whatever;
IEnumerable<Giraffe> distinct = giraffes.Distinct(animalComparer);

This illustrates a subtle and slightly unfortunate design choice in the method type inference algorithm, which of course was designed long before covariance and contravariance were added to the language.

Continue reading

How do we ensure that method type inference terminates?

Here’s a question I got from a coworker recently:

It is obviously important that the C# compiler not go into infinite loops. How do we ensure that the method type inference algorithm terminates?

The answer is quite straightforward actually, but if you are not familiar with method type inference then this article is going to make no sense. You might want to watch this video if you need a refresher. Continue reading

Curiouser and curiouser

Here’s a pattern you see all the time in C#:

class Frob : IComparable<Frob>

At first glance you might ask yourself why this is not a “circular” definition; after all, you’re not allowed to say class Frob : Frob(*). However, upon deeper reflection that makes perfect sense; a Frob is something that can be compared to another Frob. There’s not actually a real circularity there.

This pattern can be genericized further:

class SortedList<T> where T : IComparable<T>

Again, it might seem a bit circular to say that T is constrained to something that is in terms of T, but actually this is just the same as before. T is constrained to be something that can be compared to T. Frob is a legal type argument for a SortedList because one Frob can be compared to another Frob.

But this really hurts my brain:

class Blah<T> where T : Blah<T>

That appears to be circular in (at least) two ways. Is this really legal?

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don’t like it and I discourage its use.

This is a C# variation on what’s called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

So, why would you want to do that, and why do I object to it?

One reason why people want to do this is to enforce a particular constraint in a type hierarchy. Suppose we have

abstract class Animal
  public virtual void MakeFriends(Animal animal);

But that means that a Cat can make friends with a Dog, and that would be a crisis of Biblical proportions! (***) What we want to say is

abstract class Animal
  public virtual void MakeFriends(THISTYPE animal);

so that when Cat overrides MakeFriends, it can only override it with Cat.

Now, that immediately presents a problem in that we’ve just violated the Liskov Substitution Principle. We can no longer call a method on a variable of the abstract base type and have any confidence that type safety is maintained. Variance on formal parameter types has to be contravariance, not covariance, for it to be typesafe. And moreover, we simply don’t have that feature in the CLR type system.

But you can get close with the curious pattern:

abstract class Animal<T> where T : Animal<T>
  public virtual void MakeFriends(T animal);
class Cat : Animal<Cat>
  public override void MakeFriends(Cat cat) {}

and hey, we haven’t violated the LSP and we have guaranteed that a Cat can only make friends with a Cat. Beautiful.

Wait a minute… did we really guarantee that?

class EvilDog : Animal<Cat>
  public override void MakeFriends(Cat cat) { }

We have not guaranteed that a Cat can only make friends with a Cat; an EvilDog can make friends with a Cat too. The constraint only enforces that the type argument to Animal be good; how you use the resulting valid type is entirely up to you. You can use it for a base type of something else if you wish.

So that’s one good reason to avoid this pattern: because it doesn’t actually enforce the constraint you think it does. Everyone has to play along and agree that they’ll use the curiously recurring pattern the way it was intended to be used, rather than the evil dog way that it can be used.

The second reason to avoid this is simply because it bakes the noodle of anyone who reads the code. When I see List<Giraffe> I have a very clear idea of what the relationship is between the List<> part — it means that there are going to be operations that add and remove things — and the Giraffe part — those operations are going to be on giraffes. When I see FuturesContract<T> where T : LegalPolicy I understand that this type is intended to model a legal contract about a transaction in the future which has some particular controlling legal policy. But when I read Blah<T> where T : Blah I have no intuitive idea of what the intended relationship is between Blah<T> and any particular TIt seems like an abuse of a mechanism rather than the modeling of a concept from the program’s “business domain”.

All that said, in practice there are times when using this pattern really does pragmatically solve problems in ways that are hard to model otherwise in C#; it allows you to do a bit of an end-run around the fact that we don’t have covariant return types on virtual methods, and other shortcomings of the type system. That it does so in a manner that does not, strictly speaking, enforce every constraint you might like is unfortunate, but in realistic code, usually not a problem that prevents shipping the product.

My advice is to think very hard before you implement this sort of curious pattern in C#; do the benefits to the customer really outweigh the costs associated with the mental burden you’re placing on the code maintainers?

(*) Due to an unintentional omission, some past editions of the C# specification actually did not say that this was illegal! However, the compiler has always enforced it. In fact, the compiler has over-enforced it, sometimes accidentally catching non-cycles and marking them as cycles.

(**) Most frequently in emails asking “is this really legal?”

(***) Mass hysteria!

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T (or Nullable<T>.) Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading

Covariance and contravariance in C#, part 1

I have been wanting for a long time to do a series of articles about covariance and contravariance (which I will shorten to “variance” for the rest of this series.)

I’ll start by defining some terms, then describe what variance features C# 2.0 and 3.0 already support today, and then discuss some ideas we are thinking about for hypothetical nonexistent future versions of C#.

As always, keep in mind that we have not even shipped C# 3.0 yet. Any of my musings on possible future additions to the language should be treated as playful hypotheses, rather than announcements of a commitment to ship any product with any feature whatsoever.

Today: what do we mean by “covariance” and “contravariance”?

The first thing to understand is that for any two types T and U, exactly one of the following statements is true:

  • T is bigger than U.
  • T is smaller than U.
  • T is equal to U.
  • T is not related to U.

For example, consider a type hierarchy consisting of Animal, Mammal, Reptile, Giraffe, Tiger, Snake and Turtle, with the obvious relationships. (Mammal is a subclass of Animal, and so on.)

Mammal is a bigger type than Giraffe and smaller than Animal, and obviously equal to Mammal. But Mammal is neither bigger than, smaller than, nor equal to Reptile, it’s just different.

Why is this relevant? Suppose you have a variable, that is, a storage location. Storage locations in C# all have a type associated with them. At runtime you can store an object which is an instance of an equal or smaller type in that storage location. That is, a variable of type Mammal can have a reference to an instance of Giraffe stored in it, but not a Turtle.

This idea of storing an object in a typed location is a specific example of a more general principle called the “substitution principle”. That is, in many contexts you can often substitute an instance of a “smaller” type for a “larger” type.

Now we can talk about variance. Consider an “operation” which manipulates types. If the results of the operation applied to any T and U always results in two types T’ and U’ with the same relationship as T and U, then the operation is said to be “covariant”. If the operation reverses bigness and smallness on its results but keeps equality and unrelatedness the same then the operation is said to be “contravariant”.

That’s totally highfalutin and probably not very clear. Next time we’ll look at how C# 3 implements variance at present.

Commentary from 2020

At the time I wrote this series we were done with C# 3.0 but it had not shipped yet, and we were therefore well underway with the design and implementation of C# 4.0. The purpose of this series was to introduce the concept of generic variance, which we were planning to add in C# 4.0, and get early feedback on whether it would be useful and whether it would be understood.

Because we were gathering information to make a decision about how to design the feature and whether it was worth the effort at all, I was very concerned that my words not be construed as a promise to deliver any particular feature on any particular schedule. However, one of the nice things about my early days of blogging when I was a Microsoft employee was that the company pretty much gave us the guidance of “err on the side of whatever helps users most” and let the lawyers worry about the legal stuff.

There were a lot of confusing comments left on the original article. I realized later that I was insufficiently clear, even though I had boldfaced the key line to call attention to it. Covariance is about preserving a relationship across a mapping, and the thing I wanted to be very clear was: the relationship that is preserved is “a value of this type is assignable to a variable of that type“. This relationship is called assignment compatibility. In retrospect I also should have been more clear that I was specifically talking about assignment compatibility of reference types, and that the conversion from one type to another had to be a value-preserving reference conversion.

The most common confusion I’ve seen in the decade which followed, and I still see a lot, is that covariance is not “you can use a giraffe where a mammal is required“. That is assignment compatibility. Covariance is “because you can use a giraffe where a mammal is required, you are also allowed to use a sequence of giraffes where a sequence of mammals is required“. It is the preservation of assignment compatibility when the types are transformed from “giraffe” and “animal” to “sequence of giraffes” and “sequence of animals” that makes the transformation covariant.

As was true of most of this series, there were comments asking for return type covariance. It looks like that may finally be added to C# 9.