A contravariance conundrum

Suppose we have my usual hierarchy of types, Animal, Giraffe, etc, with the obvious type relationships. An IEqualityComparer<T> is contravariant in its type parameter; if we have a device which can compare two Animals for equality then it can compare two Giraffes for equality as well. So why does this code fail to compile?

IEqualityComparer<Animal> animalComparer = whatever;
IEnumerable<Giraffe> giraffes = whatever;
IEnumerable<Giraffe> distinct = giraffes.Distinct(animalComparer);

This illustrates a subtle and slightly unfortunate design choice in the method type inference algorithm, which of course was designed long before covariance and contravariance were added to the language.

Continue reading

About these ads

How do we ensure that method type inference terminates?

Here’s a question I got from a coworker recently:

It is obviously important that the C# compiler not go into infinite loops. How do we ensure that the method type inference algorithm terminates?

The answer is quite straightforward actually, but if you are not familiar with method type inference then this article is going to make no sense. You might want to watch this video if you need a refresher.

Method type inference since C# 3.0 basically works like this: we create a set of bounds on each method type parameter. We then “fix” each type parameter to a member of its bounds set. That might enable new bounds to be computed, so this algorithm is a loop. Once every type parameter is fixed, method type inference has succeeded. If any type parameter cannot be fixed for some reason then type inference fails.

We ensure termination like this: if we manage to make it through the body of the loop without fixing at least one type parameter then type inference fails. Therefore, the type inference loop can run at most n times if the method has n type parameters. If we make it to n times through the loop then type inference must have fixed a type parameter on every iteration, and type inference has succeeded. If we failed before n times through the loop then obviously type inference did not run forever.

That’s a bit highfalutin; let me flesh that out a bit. A “bound” is nothing more than a type, and a bound can be “upper”, “lower” or “exact”. For example, suppose we have a type parameter T with three bounds: a lower bound of Giraffe, an exact bound of Mammal, and an upper bound of Animal. Let’s say that Animal is a “larger” type than Mammal (because all mammals are animals but not all animals are mammals, thus Animal must be the larger type), and Giraffe is a “smaller” type than Mammal. Given this set of bounds we know that T must be inferred to be first, either Giraffe or a type larger than Giraffe, because Giraffe is a “lower bound”; you can’t infer a type smaller than Giraffe. Second, we know that T must be Mammal, exactly. And third, we know that T must be either Animal or a type smaller than Animal, because Animal is an upper bound. We cannot infer a type larger than Animal. The C# compiler deduces that Mammal is the only type in the set that meets all three requirements, and so T would be fixed to Mammal. If there are multiple types in the set that meet all the requirements (which of course cannot happen if there are any exact bounds!) then we pick the largest such type.[1. Note that this algorithm is consistent with other type inference features in C# in two ways. First, when asked to infer a best type from a set of types, we always choose a type from the set. We never say "well we have Dog and Cat in the set so let's infer Mammal" unless Mammal is itself in the set. Second, when faced with multiple possible "best" types, we pick the largest. There is an argument to be made for picking the smallest, but picking the largest seems to match more people's intuitions of what the right choice is.]

The interesting part of method type inference is how we deal with lambdas. Suppose we have a method Select<A, R>(I<A>, Func<A, R>) where the second argument is c=>c.Name. We say that A is an “input” type parameter and R is an “output” type parameter. (It is of course possible for a type parameter to be both an input and output type parameter!) Furthermore, we say that R “depends on” A, because the type of A could possibly determine the type of R. (Of course the “depends” relationship can be cyclic.)

The type inference algorithm, at a high level, goes like this:

  • Add bounds to type parameters based on all non-lambda arguments, and all lambda arguments where the delegate type has no type parameters in its inputs.
  • Loop
    • Is every type parameter fixed?
      • Type inference has succeeded. Terminate the algorithm.
    • Is there any lambda argument converted to a delegate type where the inputs of the delegate type are all known and the output type involves an unfixed type parameter?
      • Deduce the return type of all such lambdas and make inferences that add bounds to the corresponding delegate’s output types.
    • Is there any unfixed, bounded type parameter that does not appear in an output type of a delegate that has unfixed input types?
      • Fix all such type parameters and go back to the top of the loop.
    • Is there any unfixed, bounded type parameter such that an unfixed type parameter depends on it, directly or indirectly?
      • Fix all such type parameters and go back to the top of the loop.
  • If we make it here then we failed to make progress; we have just as many fixed type parameters as we started with. Type inference fails. Terminate the algorithm.

So, for example, if we had Select(customers, c=>c.Name); where customers implements I<Customer> then we start by inferring that A has a lower bound of Customer.[2. Assuming that the type I<T> is covariant in T. If it were contravariant then we would deduce an upper bound, and if it were invariant then we would deduce an exact bound. See my series on variance if that is not clear.] We have no lambda arguments that correspond to formal parameters where the delegate type has no type parameters in its inputs, so we enter the loop.

Is every type parameter fixed? No.

Is there any lambda argument converted to a delegate type where the inputs are known and the output involves an unfixed type parameter? No. There is a lambda argument converted to a delegate type, and the output involves unfixed type parameter R, but the input type is A and A is not fixed. So we have no inferences to make.

Is there an unfixed type parameter that has bounds and does not appear in an output type of a delegate that has unfixed input types? Yes. A has bounds and does not appear as an output type, period.

Therefore we fix A. It has only one bound, Customer, so we fix it to Customer. We have made progress, so we go back to the top of the loop.

Is every type parameter fixed? No.

Is there any lambda argument converted to a delegate type where the inputs are known and the output involves an unfixed type parameter? Yes! Now we make an inference. A is fixed to Customer, so we add the type of Customer.Name, say, string, as a lower bound to R. Now we must fix something.

Is there an unfixed type parameter that has bounds and does not appear in an output type of a delegate that has unfixed input types? Yes. R is unfixed, it has bounds, and it appears as an output type of a delegate that has fixed input types, so it is a candidate for fixing. We fix R to its only bound, string, and start the loop again.

Is every type parameter fixed? Yes. We’re done.

This technique of preventing infinite loops by requiring that each loop iteration make progress is quite useful, and clearly in this case it guarantees that the algorithm executes the loop no more times than there are type parameters to fix.

You might wonder if it is therefore the case that method type inference is O(n) in the number of type parameters. It turns out that it is not, for several reasons. First, as a practical matter it only makes sense to determine the asymptotic order of an algorithm if the size of the problem is likely to become large. I’ve never seen a method with more than five type parameters in the wild (aside from the tuple constructors, which are straightforward), and even that is pretty unusual. Most generic methods have one or two type parameters. Second, doing the analysis of the lambdas is the expensive part, and it only really makes sense to analyze the behaviour of the most expensive part. We already know that analyzing lambdas is, worst case, an NP-HARD problem so whether or not method type inference is O(some polynomial) is possibly not that relevant. Third, you’ll notice that in my sketch of the algorithm we have to answer questions like “is there any unfixed type parameter that has an unfixed type parameter that depends on it?” This requires solving a graph-traversal problem, whose asymptotic cost we have not analyzed! I won’t take you through the boring analysis, but suffice to say there could be O(n2) dependency relationships that each cost O(n) to analyze, and we could go through the loop n times, for an extremely unlikely worst case of O(n4). The implementation of this algorithm is actually O(n2) in the common case; because n is likely to be small, as I said, we have not put the effort into more sophisticated algorithms that can solve these graph problems even faster in the asymptotic case.


Next time on FAIC: There are functions on integers that grow faster than any function you can write a program to produce. Demonstrating this surprising fact involves beavers. We’ll see why!

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T.[1. Or Nullable<T>.] Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading