Summing up the story so far: we have a formal system that allows us to start with the types of identifiers in an environment, and provide a series of deductive steps that ends the type of an expression. This is called a “derivation”.

Last time we discussed the “proof” that a “compile-time” series of deductions about the type of an expression is also a guarantee that the expression will always have a value compatible with that type at “run time”. I say “proof” because of course it was a proof by assertion that there exists an inductive proof.

The goal of this paper is to provide an algorithm that takes an expression and a set of typing assumptions for an environment, and produces a derivation that concludes with the type of the expression.

There are two more sketch proofs in this section of the paper, which we’ll briefly review.

We will also require later the following two properties of the inference system. Proposition 2. If S is a substitution and A ⊢ e:σ then S A ⊢ e:S σ. Moreover if there is a derivation of A ⊢ e:σ : of height n then there is also a derivation of S A ⊢ e:S σ of height less [than] or equal to n. Proof. By induction on n.

An example might help here. Suppose we have a derivation for

{x:α} ⊢ λy.x:∀β β→α

Now we have a substitution S that means “change all the α to int”. The we can apply the substitution to both sides and still have a true statement:

{x:int} ⊢ λy.x:∀β β→int

Now, remember what that turnstile means. It means that there is actually a derivation: a finite sequence of applications of our six rules, that starts with a bunch of assumptions and ends with “those assumptions entail that this expression has this type”. So if “apply the substitution to both sides” is in fact legal, then there must be a derivation of the substituted type from the substituted assumptions.

The proposition here claims that not only is there such a derivation, but moreover that there is a derivation of equal or shorter length! Substitution apparently can make a derivation shorter, but you never need to make it longer.

The proof-by-assertion here again says that you can prove this by induction, this time by “ordinary” induction on the integer n, the length of the derivation. Let’s sketch that out.

The base case is vacuously true: there are no derivations of length zero, and everything is true of all members of the empty set.

Now, suppose that you have a derivation of `A ⊢ e':σ'`

that is k steps long. Suppose that there is a derivation of `S A ⊢ e':S σ'`

that is k steps long or shorter. And suppose by adding just one more step to our original derivation, we can derive `A ⊢ e:σ`

. Now all we must show is that we can add zero or one additional steps to the derivation of `S A ⊢ e':S σ'`

in order to deduce `S A ⊢ e:S σ`

. There are only six possible steps that could be added, so all we have to show is that each of those possible steps still works under a substitution, or can be omitted entirely.

I’m not going to do that here as it is tedious; noodle around with it for a bit and convince yourself that it’s true.

One more theorem, and then we’re done with this section:

Lemma 1. If σ > σ' and Ax ∪ {x:σ'} ⊢ e:σ_{0}then also Ax ∪ {x:σ} ⊢ e:σ_{0}

Incidentally, why have we had two propositions and a lemma?

Propositions, lemmas and theorems are all the same thing: a claim with a formal justification that proves the claim. We use different words to communicate to the reader how important each is.

Propositions are the least interesting; they are typically just there to prove some technical point that is necessary later on in a larger theorem. The proofs are often just hand-waved away as too tedious to mention, as they have been here.

Lemmas are more interesting; they are also in the service of some larger result, but are interesting enough or non-trivial enough that the proof might need to be sketched out in more detail.

Theorems are the major interesting results of the paper, and are all useful in their own right, not just as stepping stones to some larger result.

These lines are blurry of course, since the difference is entirely in emphasizing to the reader what we think is important.

What is this lemma saying? Basically, if we can derive the type of an expression given a small type bound on an identifier, then we can come up with a derivation of the same type for the same expression even if we have a larger bound. For example, suppose we have a derivation for

{y:int, x:int→int} ⊢ (x y):int

Then we can also find a derivation that gives the same result even with a larger type for x:

{y:int, x:∀β β→β} ⊢ (x y):int

Again, remember what the turnstile is saying: that we have a finite sequence of applications of our six rules that gets us from the first set of assumptions to the first type. For this lemma to be true, we must also have a finite sequence of rules that gets us from our new set of assumptions to that same type. The proof sketches out how to create such a sequence:

Proof. We construct a derivation of Ax ∪ {x:σ'} ⊢ e:σ_{0}from that of Ax ∪ {x:σ'} ⊢ e:σ_{0}by substituting each use of TAUT for x:σ' with x:σ, followed by an INST step to derive x:σ'. Note that GEN steps remain valid since if α occurs free in σ then it also occurs free in σ'.

So at most the new derivation gets longer by a finite number of steps, less than or equal to the number of TAUT steps in the original derivation.

And that’s it for section five! Next time we’ll start looking at the algorithm that actually finds the type of an expression. We’ll start by discussing type unification.

Enjoying this series! We’re rapidly approaching page six with the curious footnotes. I think I know the “obvious typos” in the exposition of algorithm W, but I’ll wait for your comments to be sure.