Excessive explanation, part twenty-four

We have four kinds of things in Exp: identifiers, function calls, lambdas and lets, and we have four rules in type inference. Finally, the last recursive rule of the type inference algorithm:

(iv) 
If e is let x = e1 in e2 then let W(A, e1) = (S1, τ1) and
             ____  
W(S1 Ax ∪ {x:S1 A(τ1), e2) = (S2, τ2); then S = S2 S1 and τ = τ2.

Again let’s look at an example. Let’s say we want to infer a type for

let double_it = λx.(plus x x) in double_it foo

where we have plus as before, and foo:int in our set of assumptions. We begin by recursing on λx.(plus x x) with the current set of assumptions. Note that we do not want to remove double_it from our set of assumptions if it is there because the expression might use double_it, meaning some other definition. If it does, then it uses the definition already in the assumptions; the “new” double_it is only in scope after the in.

Type inference tells us that the expression is of type int → int, so we add an assumption double_it:int → int to our set of assumptions, and compute the type of double_it foo under the new set of assumptions, which of course says that it is int. Since the type of a let expression is the type of the second expression, we’re done.

The reason we need the closure is: suppose type inference tells us that the first expression is of type β→β, and β is free in A. (Remember, we always get a type, not a type scheme.) For the purposes of inferring the type of the right side of the in, we want the expression to have type scheme ∀β β→β.

NOTE: When any of the conditions above is not met W fails.

Though this is true, I think it would have been better to call out the two places in the algorithm where the failure can occur. They are in rule (i), which requires that an identifier have a type in the set of assumptions, and in rule (ii) which requires that type unification succeeds. Rules (iii) and (iv) never fail of their own accord; they only fail when the recursive application of W fails.

Next time: more sketch proofs!

Excessive explanation, part twenty-three

Image

Last time we gave the base case of our recursive algorithm. Today, two of the three non-base cases.

But before we get to them, another annoyance.

The first time I read this algorithm carefully I got to the second step of the inference algorithm and said to myself wait, that cannot possibly be right. I stared at that thing for probably fifteen minutes trying to make sense of it before realizing that there is a footnote at the bottom of the page that says

There are obvious typographic errors in parts (ii) and (iv)
which are in the original publication. I have left the correction
of these as an easy exercise for the reader.

Now, I am hardly one to talk as I leave stuff as exercises for the reader all the time on Stack Overflow. But really? There’s a typo in the paper, you know about it, and you leave the typo in the paper as a challenge? VEXING.

I will remove the typos in my presentation of the algorithm. Anyone who wants the frankly rather dubious challenge of attempting to fix the typos themselves can go read the original paper.

Onwards!

(ii) 
If e is e1 e2 then let W(A, e1) = (S1, τ1) and W(S1 A, e2) = (S2, τ2) 
and U(S2 τ1, τ2 → β) = V where β is new; then S = V S2 S1 and τ = V β.

Let’s unpack this, starting with just the types.

We have a function application e1 e2. We recursively determine that e1 is some function type τ1 and e2 is its argument of type τ2.

Therefore τ1 had better be a function type that takes a τ2 and returns something of some type; call it β. Whatever β is must be the type of the function invocation expression. To find it, we unify τ1 with τ2 → β and we get a substitution that tells us what type to substitute for β. That’s our resulting type.

Let’s take a look at a trivial example. Suppose our assumptions are x:int and i:∀α α→α, and our expression to be inferred is i x. We recursively apply type inference to determine that τ1 is β → β — remember, we need a new type variable for the type τ1 — and τ2 is of course int. Substitutions S1 and S2 are empty substitutions. We unify β → β with int → γ — again we need a new type variable used nowhere else — and get the subtitution V = [β/int, γ/int]. Our result type is V γ which is obviously int.

We can deduce that passing an int to an identity function returns int; awesome.

(iii) 
If e is λx.e1 then let β be a new type variable and
W(Ax ∪ {x:β}, e1) = (S1, τ1); then S = S1 and τ = S1 β → τ1. 

Let’s think of an example. Suppose we have an assumption plus:int → int → int and the expression λx.(plus x x). So we make a new assumption x:β, and run algorithm W on plus x x. That tells us that the expression is of type int, and also returns a substitution [int/β].

The type of the lambda is therefore [int/β](β → int), which obviously is int → int.

Next time: inferring the type of a “let” expression.

Excessive explanation, part twenty-two

All right, at long last, the type inference algorithm itself!

Algorithm W.

W(A, e) = (S, τ) where

This is just a reminder that what we’re doing here is taking a set of assumptions that give types to identifiers, and an expression that might contain some of those identifiers, and from that we deduce a substitution on the identifier types, and a type for the expression.

This reminder however does not remind us that the algorithm can also tell us that there is no type that can be inferred for the expression given those assumptions; properly we ought to be saying that we produce a substitution and a type, or an error.

(i) 
If e is x and there is an assumption x:∀α1, ... ,αn τ' in A then 
S = Id and τ = [βii]τ' where the βi are new. 

Of course this is the identity (empty) substitution, not the set 
Id of identifiers.

Just a small rant here. First off, we already have a notation for substitutions which is a list of τ/α pairs in square brackets. The notation [ ] is a perfectly concise way to describe the identity type substitution. So why did the authors chose to re-use “Id” to mean “the empty substitution” when it already means “the set of identifiers”, thereby prompting them to insert a note to the effect that this notation is confusing? In code reviews I tell people to never comment confusing code; fix it so it is not confusing! A two-character fix here would ensure that the clarifying note is unneeded.

I also cannot fathom why we’ve gone from notating a type scheme with multiple bindings as ∀α1…αn at the beginning of the paper to ∀α1, … ,αn at the end. Before there were no commas; now there are commas. Small things like this are an impediment to understanding. Attention to detail is important in a paper, and in code.

Rant over. More rants next time!

As you might have guessed, this is going to be a recursive algorithm. Therefore we start with the simplest possible base case. If the expression we’re trying to deduce is an identifier whose type scheme is given in the assumptions, then we just use that type.

There are a few subtleties to consider carefully though.

If we have expression x and an assumption like x:int then it is easy; x is deduced to be int. Similarly if we have a free type variable; if we have x:α then we simply deduce that x is, of course, α.

The tricky case is where we have an assumption that x is quantified. Maybe we have assumption x:∀α1α2 α1→α2. We do not want to deduce that the type of the expression is the type α1→α2 because α1 or α2 might appear free elsewhere, and we don’t want to confusingly associate the type of x with those free variables. We need to make new free type variables, so we deduce that the type of x is β1→β2 where these are brand new not previously used type variables.

When I joined the C# language design team the major design work for LINQ was done but there were still a few details left. The existing generic method type inference algorithm was insufficient to deal with a world that included lambdas, so we needed to fix it up. Complicating matters was the fact that the existing code that did type inference was difficult to follow and did not bear a very close resemblance to the specification in its action. (Its results were fine, but it was hard to look at the code and see how the specification had been translated into code.)

Given that we were making major changes to the algorithm, I decided to scrap the existing implementation and start fresh with an implementation that was (1) very clearly implementing exactly what was in the specification, so that as we tweaked the spec during the design it would be clear what needed to change in the implementation, and (2) written in a style similar to how I would want to write the same algorithm in C#; I was anticipating the future existence of Roslyn and I wanted to make that transition smooth.

And I really did start from scratch, including rewriting the type unification and substitution algorithms from scratch as well.

The first major mistake that got caught after I checked it in was:

void M<T, U>(T t, U u, bool flag) 
{
  if (flag) 
    M(flag, t, false);
}

What types should be inferred on the recursive call? Plainly this should be the same as M<bool, T>(flag, t, false);. My first implementation deduced that this was M<bool, bool>. You can probably guess why. The type inferencer deduced that T is bool and U is T, and the substituter said OK, if T is bool and U is T then U is also bool!

The T that is being substituted in for U is the T of the outer invocation of M. The T of the inner invocation of M is a different T entirely, but I used the same object to represent both! When you are implementing these algorithms you have to be super careful about not granting referential identity to things that look the same but are in fact just placeholders for different activations.

Fortunately we had a massive set of test cases which found my bug in short order.

Next time we’ll look at the non-base cases, and some irritating typos.

Excessive explanation, part twenty-one

All right; we have a deductive system whereby the types of expressions can be justified with derivations. What we want is the ability to take an expression and the typing assumptions of an environment and deduce the type scheme of that expression:

6 The type assignment algorithm W

The type inference system itself does not provide an easy 
method for finding, given A and e, a type-scheme σ such that 
A ⊢ e:σ 

Bummer!

We now present an algorithm W for this purpose. In 
fact, W goes a step further. Given A and e, if W succeeds it
finds a substitution S and a type τ, which are most general 
in a sense to be made precise below, such that

                        S A ⊢ e:τ.

Note that this is a bit subtle. Our inputs are a set of assumptions A — what are the type schemes of all the relevant identifiers? — and an expression e, whose type we wish to know. Our outputs are a substitution — what types should some of the type variables have? — and a type. Not, interestingly, a type scheme; a type.

To define W we require the unification algorithm of Robinson [6].

Proposition 3 (Robinson). There is an algorithm U which, given 
a pair of types, either returns a substitution V or fails; further

(i) If U(τ,τ') returns V, then V unifies τ and τ', i.e. V τ = V τ'.

Note that there was a typo in the PDF of the paper here; it said V τ = τ'. I’ve corrected it here.


(ii) If S unifies τ and τ' then U(τ,τ') returns some V and 
there is another substitution R such that S = RV .

Moreover, V involves only variables in τ and τ'.

What does this mean?

The first claim is the important one. Suppose we have two types. Let’s say β→β and int→int. The unification algorithm would find a substitution on the two types that makes them equal; obviously the substitution is [int/β]. That’s an easy one; you can I’m sure see how these could get quite complicated.

A slightly more complicated example illustrates that both types might need substitution: Consider unifying int → γ with β → β. The substitution V = [int/β, int/γ] does make them equal to each other under substitution.

The unification algorithm can return “there is no such substitution” of course. There is no substitution that can turn β→β into int→(int→int), for example.

I’m not going to go through the unification algorithm in detail; perhaps you can deduce for yourself how it goes. I will however say that unification of types in the ML type system is simple compared to unification in a type system like C#’s. In C# we must solve problems like “I have an IEnumerable<Action<Tiger, T>> and I need a unification with List<blah blah blah... and so all the mechanisms of covariant and contravariant reference conversions come into play. In the C# type inference algorithm we not only need to find substitutions, we need to put bounds on the substitutions, like “substitute Mammal-or-any-subtype for T”.

The second claim notes that the substitution that unifies two types may not be unique, but that this fact doesn’t matter. That is, if V is a substitution that unifies two types found by the algorithm, and S is a different substitution that also unifies the types, then there exists a way to transform V into S.

We also need to define the closure of a type τ with respect to
assumptions A;

     _
     A(τ) = ∀α1, ... ,αn τ

where α1, ... ,αn  are the type variables occurring free τ in
but not in A.

I am not super thrilled with this notation.

Basically we are saying here that given a set of assumptions A and a type τ that might contain free type variables, we can identify the free type variables not already free in A, and quantify them to make an equivalent type scheme.

Now, we’ve already said that there is a function U that takes two types and produces a substitution, and we’re going to say that there is a function W that takes a set of assumptions and an expression and produces a substitution and a type. So why on earth do the authors not simply say “there is a function B that takes a set of assumptions and a type and produces a type scheme” ???

I genuinely have no idea why authors of these sorts of papers do not adopt a consistent approach to functions and their notation. Rather than taking a straightforward approach and notating it B A τ or B(A,τ), they notate it as A-with-a-bar-on-top(τ), for no good reason I am aware of.

I started this series because I am asked fairly frequently about how to read the notation of these papers, and boy, do they ever not make it easy. Even a little thing like whether function applications are notated U(τ,τ') as the unification function is notated, or S τ as substitutions are notated, is frequently inconsistent. And now we have yet another kind of function, this time notated as, of all things, a bar over what ought to be one of the arguments.

Sigh. I’ll stick with the notation as it is describe in the paper.

Next time: at long last, the base case of algorithm W!

Happy anniversary Visual Studio!

Oh my goodness, I cannot believe it has been twenty years since I got this t-shirt:

Happy anniversary Visual Studio, and congratulations to the whole team on VS 2017, and a particular shout-out to my Roslyn friends. I can’t wait to take the new C# and VB out for a spin. Nicely done everyone.

Even readers who were following Microsoft dev tools in 1997 would be forgiven for not knowing what the “Da Vinci” is on my shirt. It is a little-known fact that I was on the Da Vinci team for about five minutes. I’ll tell a funny story about that, but some other time.


Sorry for radio silence this past week; it has been crazy busy at work. I’ll finish off my excessive explanation series soon!