(iv) If e is let x = e_{1}in e_{2}then let W(A, e_{1}) = (S_{1}, τ_{1}) and ____ W(S_{1}A_{x}∪ {x:S_{1}A(τ_{1}), e_{2}) = (S_{2}, τ_{2}); then S = S_{2}S_{1}and τ = τ_{2}.

Again let’s look at an example. Let’s say we want to infer a type for

let double_it = λx.(plus x x) in double_it foo

where we have `plus`

as before, and `foo:int`

in our set of assumptions. We begin by recursing on `λx.(plus x x)`

with the current set of assumptions. Note that we do *not* want to remove `double_it`

from our set of assumptions if it is there because the expression might use `double_it`

, meaning some *other* definition. If it does, then it uses the definition already in the assumptions; the “new” `double_it`

is only in scope *after* the `in`

.

Type inference tells us that the expression is of type `int → int`

, so we add an assumption `double_it:int → int`

to our set of assumptions, and compute the type of `double_it foo`

under the new set of assumptions, which of course says that it is `int`

. Since the type of a `let`

expression is the type of the second expression, we’re done.

The reason we need the closure is: suppose type inference tells us that the first expression is of type `β→β`

, and `β`

is free in `A`

. (Remember, we always get a type, not a type scheme.) For the purposes of inferring the type of the right side of the `in`

, we want the expression to have type scheme `∀β β→β`

.

NOTE: When any of the conditions above is not met W fails.

Though this is true, I think it would have been better to call out the two places in the algorithm where the failure can occur. They are in rule (i), which requires that an identifier have a type in the set of assumptions, and in rule (ii) which requires that type unification succeeds. Rules (iii) and (iv) never fail of their own accord; they only fail when the recursive application of W fails.

Next time: more sketch proofs!

]]>But before we get to them, another annoyance.

The first time I read this algorithm carefully I got to the second step of the inference algorithm and said to myself *wait, that cannot possibly be right*. I stared at that thing for probably fifteen minutes trying to make sense of it before realizing that there is a footnote at the bottom of the page that says

There are obvious typographic errors in parts (ii) and (iv) which are in the original publication. I have left the correction of these as an easy exercise for the reader.

Now, I am hardly one to talk as I leave stuff as exercises for the reader all the time on Stack Overflow. But really? There’s a typo in the paper, you know about it, and you leave the typo in the paper as a challenge? VEXING.

I will remove the typos in my presentation of the algorithm. Anyone who wants the frankly rather dubious challenge of attempting to fix the typos themselves can go read the original paper.

Onwards!

(ii) If e is e_{1}e_{2}then let W(A, e_{1}) = (S_{1}, τ_{1}) and W(S_{1}A, e_{2}) = (S_{2}, τ_{2}) and U(S_{2}τ_{1}, τ_{2}→ β) = V where β is new; then S = V S2 S1 and τ = V β.

Let’s unpack this, starting with just the types.

We have a function application `e`

. We recursively determine that _{1} e_{2}`e`

is some function type _{1}`τ`

and _{1}`e`

is its argument of type _{2}`τ`

. _{2}

Therefore `τ`

had better be a function type that takes a _{1}`τ`

and returns something of some type; call it β. Whatever β is must be the type of the function invocation expression. To find it, we unify _{2}`τ`

with _{1}`τ`

and we get a substitution that tells us what type to substitute for β. That’s our resulting type._{2} → β

Let’s take a look at a trivial example. Suppose our assumptions are `x:int`

and `i:∀α α→α`

, and our expression to be inferred is `i x`

. We recursively apply type inference to determine that `τ`

is _{1}`β → β`

— remember, we need a new type variable for the type `τ`

— and _{1}`τ`

is of course _{2}`int`

. Substitutions `S`

and _{1}`S`

are empty substitutions. We unify _{2}`β → β`

with `int → γ`

— again we need a new type variable used nowhere else — and get the subtitution `V = [β/int, γ/int]`

. Our result type is `V γ`

which is obviously `int`

.

We can deduce that passing an int to an identity function returns int; awesome.

(iii) If e is λx.e_{1}then let β be a new type variable and W(A_{x}∪ {x:β}, e_{1}) = (S_{1}, τ_{1}); then S = S_{1}and τ = S_{1}β → τ_{1}.

Let’s think of an example. Suppose we have an assumption `plus:int → int → int`

and the expression `λx.(plus x x)`

. So we make a new assumption `x:β`

, and run algorithm W on `plus x x`

. That tells us that the expression is of type `int`

, and also returns a substitution `[int/β]`

.

The type of the lambda is therefore `[int/β](β → int)`

, which obviously is `int → int`

.

Next time: inferring the type of a “let” expression.

]]>Algorithm W. W(A, e) = (S, τ) where

This is just a reminder that what we’re doing here is taking a set of assumptions that give types to identifiers, and an expression that might contain some of those identifiers, and from that we deduce a substitution on the identifier types, and a type for the expression.

This reminder however does not remind us that the algorithm can also tell us that there is no type that can be inferred for the expression given those assumptions; properly we ought to be saying that we produce a substitution and a type, or an error.

(i) If e is x and there is an assumption x:∀α_{1}, ... ,α_{n}τ' in A then S = Id and τ = [β_{i}/α_{i}]τ' where the β_{i}are new. Of course this is the identity (empty) substitution, not the set Id of identifiers.

Just a small rant here. First off, we already have a notation for substitutions which is a list of `τ/α`

pairs in square brackets. The notation `[ ]`

is a perfectly concise way to describe the identity type substitution. So why did the authors chose to re-use “Id” to mean “the empty substitution” when it already means “the set of identifiers”, thereby prompting them to insert a note to the effect that this notation is confusing? In code reviews I tell people to never comment confusing code; fix it so it is not confusing! A two-character fix here would ensure that the clarifying note is unneeded.

I also cannot fathom why we’ve gone from notating a type scheme with multiple bindings as ∀α_{1}…α_{n} at the beginning of the paper to ∀α_{1}, … ,α_{n} at the end. Before there were no commas; now there are commas. Small things like this are an impediment to understanding. Attention to detail is important in a paper, and in code.

Rant over. More rants next time!

As you might have guessed, this is going to be a recursive algorithm. Therefore we start with the simplest possible base case. If the expression we’re trying to deduce is an identifier whose type scheme is given in the assumptions, then we just use that type.

There are a few subtleties to consider carefully though.

If we have expression `x`

and an assumption like `x:int`

then it is easy; `x`

is deduced to be `int`

. Similarly if we have a free type variable; if we have `x:α`

then we simply deduce that `x`

is, of course, `α`

.

The tricky case is where we have an assumption that `x`

is quantified. Maybe we have assumption `x:∀α`

. We do not want to deduce that the type of the expression is the type _{1}α_{2} α_{1}→α_{2}`α`

because _{1}→α_{2}`α`

or _{1}`α`

might appear free elsewhere, and we don’t want to confusingly associate the type of _{2}`x`

with those free variables. We need to make new free type variables, so we deduce that the type of `x`

is `β`

where these are brand new not previously used type variables._{1}→β_{2}

When I joined the C# language design team the major design work for LINQ was done but there were still a few details left. The existing generic method type inference algorithm was insufficient to deal with a world that included lambdas, so we needed to fix it up. Complicating matters was the fact that the existing code that did type inference was difficult to follow and did not bear a very close resemblance to the specification in its action. (Its results were fine, but it was hard to look at the code and see how the specification had been translated into code.)

Given that we were making major changes to the algorithm, I decided to scrap the existing implementation and start fresh with an implementation that was (1) very clearly implementing exactly what was in the specification, so that as we tweaked the spec during the design it would be clear what needed to change in the implementation, and (2) written in a style similar to how I would want to write the same algorithm in C#; I was anticipating the future existence of Roslyn and I wanted to make that transition smooth.

And I really did start from scratch, including rewriting the type unification and substitution algorithms from scratch as well.

The first major mistake that got caught after I checked it in was:

void M<T, U>(T t, U u, bool flag) { if (flag) M(flag, t, false); }

What types should be inferred on the recursive call? Plainly this should be the same as `M<bool, T>(flag, t, false);`

. My first implementation deduced that this was `M<bool, bool>`

. You can probably guess why. The type inferencer deduced that T is bool and U is T, and the substituter said OK, if T is bool and U is T then U is also bool!

The T that is being substituted in for U is the T of the *outer *invocation of M. The T of the *inner *invocation of M is a different T entirely, but I used the same object to represent both! When you are implementing these algorithms you have to be super careful about not granting referential identity to things that look the same but are in fact just placeholders for different activations.

Fortunately we had a massive set of test cases which found my bug in short order.

Next time we’ll look at the non-base cases, and some irritating typos.

]]>6 The type assignment algorithm W The type inference system itself does not provide an easy method for finding, given A and e, a type-scheme σ such that A ⊢ e:σ

Bummer!

We now present an algorithm W for this purpose. In fact, W goes a step further. Given A and e, if W succeeds it finds a substitution S and a type τ, which are most general in a sense to be made precise below, such that S A ⊢ e:τ.

Note that this is a bit subtle. Our inputs are a set of assumptions A — what are the type schemes of all the relevant identifiers? — and an expression e, whose type we wish to know. Our outputs are a substitution — what types should some of the type variables have? — and a type. Not, interestingly, a type scheme; a type.

To define W we require the unification algorithm of Robinson [6]. Proposition 3 (Robinson). There is an algorithm U which, given a pair of types, either returns a substitution V or fails; further (i) If U(τ,τ') returns V, then V unifies τ and τ', i.e. V τ = V τ'.

Note that there was a typo in the PDF of the paper here; it said `V τ = τ'`

. I’ve corrected it here.

(ii) If S unifies τ and τ' then U(τ,τ') returns some V and there is another substitution R such that S = RV . Moreover, V involves only variables in τ and τ'.

What does this mean?

The first claim is the important one. Suppose we have two types. Let’s say `β→β`

and `int→int`

. The unification algorithm would find a substitution on the two types that makes them equal; obviously the substitution is `[int/β]`

. That’s an easy one; you can I’m sure see how these could get quite complicated.

A slightly more complicated example illustrates that both types might need substitution: Consider unifying `int → γ`

with `β → β`

. The substitution `V = [int/β, int/γ]`

does make them equal to each other under substitution.

The unification algorithm can return “there is no such substitution” of course. There is no substitution that can turn `β→β`

into `int→(int→int)`

, for example.

I’m not going to go through the unification algorithm in detail; perhaps you can deduce for yourself how it goes. I will however say that unification of types in the ML type system is simple compared to unification in a type system like C#’s. In C# we must solve problems like “I have an `IEnumerable<Action<Tiger, T>>`

and I need a unification with `List<blah blah blah...`

and so all the mechanisms of covariant and contravariant reference conversions come into play. In the C# type inference algorithm we not only need to find substitutions, we need to put bounds on the substitutions, like “substitute Mammal-or-any-subtype for T”.

The second claim notes that the substitution that unifies two types may not be unique, but that this fact doesn’t matter. That is, if V is a substitution that unifies two types found by the algorithm, and S is a different substitution that also unifies the types, then there exists a way to transform V into S.

We also need to define the closure of a type τ with respect to assumptions A; _ A(τ) = ∀α_{1}, ... ,α_{n}τ where α_{1}, ... ,α_{n}are the type variables occurring free τ in but not in A.

I am not super thrilled with this notation.

Basically we are saying here that given a set of assumptions A and a type τ that might contain free type variables, we can identify the free type variables not already free in A, and quantify them to make an equivalent type scheme.

Now, we’ve already said that there is a function U that takes two types and produces a substitution, and we’re going to say that there is a function W that takes a set of assumptions and an expression and produces a substitution and a type. So why on earth do the authors not simply say “there is a function B that takes a set of assumptions and a type and produces a type scheme” ???

I genuinely have no idea why authors of these sorts of papers do not adopt a consistent approach to functions and their notation. Rather than taking a straightforward approach and notating it `B A τ`

or `B(A,τ)`

, they notate it as `A-with-a-bar-on-top(τ)`

, for no good reason I am aware of.

I started this series because I am asked fairly frequently about how to read the notation of these papers, and boy, do they ever not make it easy. Even a little thing like whether function applications are notated `U(τ,τ')`

as the unification function is notated, or `S τ`

as substitutions are notated, is frequently inconsistent. And now we have yet another kind of function, this time notated as, of all things, a bar over what ought to be one of the arguments.

Sigh. I’ll stick with the notation as it is describe in the paper.

Next time: at long last, the base case of algorithm W!

]]>Happy anniversary Visual Studio, and congratulations to the whole team on VS 2017, and a particular shout-out to my Roslyn friends. I can’t wait to take the new C# and VB out for a spin. Nicely done everyone.

Even readers who were following Microsoft dev tools in 1997 would be forgiven for not knowing what the “Da Vinci” is on my shirt. It is a little-known fact that I was on the Da Vinci team for about five minutes. I’ll tell a funny story about that, but some other time.

Sorry for radio silence this past week; it has been crazy busy at work. I’ll finish off my excessive explanation series soon!

]]>Last time we discussed the “proof” that a “compile-time” series of deductions about the type of an expression is also a guarantee that the expression will always have a value compatible with that type at “run time”. I say “proof” because of course it was a proof by assertion that there exists an inductive proof.

The goal of this paper is to provide an algorithm that takes an expression and a set of typing assumptions for an environment, and produces a derivation that concludes with the type of the expression.

There are two more sketch proofs in this section of the paper, which we’ll briefly review.

We will also require later the following two properties of the inference system. Proposition 2. If S is a substitution and A ⊢ e:σ then S A ⊢ e:S σ. Moreover if there is a derivation of A ⊢ e:σ : of height n then there is also a derivation of S A ⊢ e:S σ of height less [than] or equal to n. Proof. By induction on n.

An example might help here. Suppose we have a derivation for

{x:α} ⊢ λy.x:∀β β→α

Now we have a substitution S that means “change all the α to int”. The we can apply the substitution to both sides and still have a true statement:

{x:int} ⊢ λy.x:∀β β→int

Now, remember what that turnstile means. It means that there is actually a derivation: a finite sequence of applications of our six rules, that starts with a bunch of assumptions and ends with “those assumptions entail that this expression has this type”. So if “apply the substitution to both sides” is in fact legal, then there must be a derivation of the substituted type from the substituted assumptions.

The proposition here claims that not only is there such a derivation, but moreover that there is a derivation of equal or shorter length! Substitution apparently can make a derivation shorter, but you never need to make it longer.

The proof-by-assertion here again says that you can prove this by induction, this time by “ordinary” induction on the integer n, the length of the derivation. Let’s sketch that out.

The base case is vacuously true: there are no derivations of length zero, and everything is true of all members of the empty set.

Now, suppose that you have a derivation of `A ⊢ e':σ'`

that is k steps long. Suppose that there is a derivation of `S A ⊢ e':S σ'`

that is k steps long or shorter. And suppose by adding just one more step to our original derivation, we can derive `A ⊢ e:σ`

. Now all we must show is that we can add zero or one additional steps to the derivation of `S A ⊢ e':S σ'`

in order to deduce `S A ⊢ e:S σ`

. There are only six possible steps that could be added, so all we have to show is that each of those possible steps still works under a substitution, or can be omitted entirely.

I’m not going to do that here as it is tedious; noodle around with it for a bit and convince yourself that it’s true.

One more theorem, and then we’re done with this section:

Lemma 1. If σ > σ' and Ax ∪ {x:σ'} ⊢ e:σ_{0}then also Ax ∪ {x:σ} ⊢ e:σ_{0}

Incidentally, why have we had two propositions and a lemma?

Propositions, lemmas and theorems are all the same thing: a claim with a formal justification that proves the claim. We use different words to communicate to the reader how important each is.

Propositions are the least interesting; they are typically just there to prove some technical point that is necessary later on in a larger theorem. The proofs are often just hand-waved away as too tedious to mention, as they have been here.

Lemmas are more interesting; they are also in the service of some larger result, but are interesting enough or non-trivial enough that the proof might need to be sketched out in more detail.

Theorems are the major interesting results of the paper, and are all useful in their own right, not just as stepping stones to some larger result.

These lines are blurry of course, since the difference is entirely in emphasizing to the reader what we think is important.

What is this lemma saying? Basically, if we can derive the type of an expression given a small type bound on an identifier, then we can come up with a derivation of the same type for the same expression even if we have a larger bound. For example, suppose we have a derivation for

{y:int, x:int→int} ⊢ (x y):int

Then we can also find a derivation that gives the same result even with a larger type for x:

{y:int, x:∀β β→β} ⊢ (x y):int

Again, remember what the turnstile is saying: that we have a finite sequence of applications of our six rules that gets us from the first set of assumptions to the first type. For this lemma to be true, we must also have a finite sequence of rules that gets us from our new set of assumptions to that same type. The proof sketches out how to create such a sequence:

Proof. We construct a derivation of Ax ∪ {x:σ'} ⊢ e:σ_{0}from that of Ax ∪ {x:σ'} ⊢ e:σ_{0}by substituting each use of TAUT for x:σ' with x:σ, followed by an INST step to derive x:σ'. Note that GEN steps remain valid since if α occurs free in σ then it also occurs free in σ'.

So at most the new derivation gets longer by a finite number of steps, less than or equal to the number of TAUT steps in the original derivation.

And that’s it for section five! Next time we’ll start looking at the algorithm that actually finds the type of an expression. We’ll start by discussing type unification.

]]>We know that `A ⊢ e:σ`

means that there exists a derivation in our deductive system that allows us to conclude that the given expression is of the given type scheme when the given assumptions are true.

And we know that `A ⊨ e:σ`

means that no matter what environment with values we assign to the identifiers named in assumptions `A`

, the value computed for expression `e`

will be a value of type scheme `σ`

.

What we would really like to know is that `A ⊢ e:σ`

logically implies `A ⊨ e:σ`

. That is, if we have a *derivation* at compile time that the type scheme is correct in our deduction system, that there will be no type violation *at runtime*! Recall that earlier we defined the “soundness” of a logical deduction as a logically correct derivation where we start with true statements and end with true statements; what we want is to know that our logical system of derivation of types is “sound” with respect to the semantics of the language.

The paper of course provides a detailed proof of this fact:

The following proposition, stating the semantic soundness of inference, can be proved by induction on e. Proposition 1 (Soundness of inference). If A ⊢ e:σ then A ⊨ e:σ.

That’s the entire text of the proof; don’t blink or you’ll miss it. It can be proved, and that’s as good as proving it, right? And there’s even a big hint in there: it can be proved by induction.

Wait, what?

Inductive proofs are a proof technique for properties of natural numbers but `e`

is an expression of language Exp, not a number at all!

Let’s consider for a moment the fundamental nature of an inductive proof. Inductive proofs on naturals depend on the following axioms: (by “number” throughout, I mean “non-negative integer”.)

- Zero is a number.
- Every number k has a successor number called k + 1.
- If something is true for zero, and it being true for k implies it is true its successor, then it is true of all numbers.

So the idea is that we prove a property for zero, and then we prove that if the property holds for k, then it also holds for k+1. That implies that it holds for 0+1, which implies that it holds for 1+1, which implies that it holds for 2+1, and so on, so we deduce that it applies to all numbers.

This is the standard form in which inductive reasoning is explained for numbers, but this is not the only form that induction can take; in fact this kind of induction is properly called “simple induction”. Here’s another form of induction:

Prove a property for zero, and then prove that if the property holds for k *and every number smaller than k*, then it also holds for k+1.

This form of induction is called “strong induction”. Simple and strong induction are equivalent; some proofs are easier to do with simple induction and some are easier with strong, but I hope it is clear that they are just variations on the technique.

Well, we can make the same reasoning for Exp:

- An identifier is an expression.
- Every other kind of expression — lambdas, lets, and function calls — is created by combining together a finite number of
*smaller*expressions. - Suppose we can prove that a property is true for the base case: identifiers. And suppose that we assume that a property is true for a given expression and all of the finite number of smaller expressions that it is composed of. And suppose we can prove from the assumption that then the property holds on every larger expression formed from this expression. Then the property holds on all expressions.

That’s just a sketch to get the idea across; actually making a formally correct description of this kind of induction, and showing that it is logically justified, would take us very far afield indeed. But suffice to say, we could do so, and are justified in using inductive reasoning on expressions.

Before we go on I want to consider a few more interesting facts about induction. Pause for a moment and think about why induction works. Because I was sloppy before; I left some important facts off the list of stuff you need to make induction work.

Suppose for example I make up a new number. Call it Bob. Bob is not the successor of any number, and it is not zero either. Bob has a successor — Bob+1, obviously. Now suppose we have a proof by induction: we prove that something is true for 0, we prove that if it is true for k then it is true for k+1. Have we proved it for all numbers? Nope! We never proved it for Bob, which is not zero and not the successor of any number!

Induction requires that every number be “reachable” by some finite number of applications of the successor relationship to the smallest number, zero. One of the axioms we need to add to our set is that zero is the *only *number that is the successor of no number.

The property that expressions in Exp share with integers is: every expression in Exp is either an identifier, or is formed from one or more strictly smaller expressions. There is no “Bob” expression in Exp; there’s no expression that we can’t get to by starting from identifiers and combining them together via lets, lambdas and function calls. And therefore if we can start from identifiers and work our way out, breadth-first, to all the successor expressions, then we will eventually get all expressions. Just as if we start from zero and keep adding one, we eventually get to all numbers.

Induction is really a very powerful technique; it applies in any case where there are irreducible base cases, and where application of successorship rules eventually gets to all possible things.

Back to the paper.

Basically the way the proof would proceed is: we’d start with the base case: does our deductive system guarantee that if `A ⊢ x:σ`

for an identifier x then `A ⊨ x:σ`

? That’s our base case, and it’s pretty obviously true. Suppose we have the assumption `A = {x:σ}`

. By assumption, the value of x at runtime must be from type scheme `σ`

. By the TAUT rule, we can deduce that `A ⊢ x:σ`

. And plainly `A ⊨ x:σ`

is true; by assumption `x`

will have a value from `σ`

! Our base case holds.

Next we would assume that `A ⊢ k:σ`

implies `A ⊨ k:σ`

, and then ask the question “can we show that for every possible successor k’ of k, `A ⊢ k':σ'`

implies `A ⊨ k':σ'`

“? If we can show that, then by the inductive property of expressions, it must be true for all expressions that `A ⊢ e:σ`

implies `A ⊨ e:σ`

.

The actual proof would be a bit trickier than that because of course some of the successors require two expressions, and so we’d have to reason about that carefully.

These proofs are extraordinarily tedious, and you learn very little from them. Thus the proof is omitted from the paper.

There are two more sketch proofs in this section. We’ll cover them next time, and then that will finish off the section on the deductive system.

]]>The following example of a derivation is organised as a tree, in which each node follows from those immediately above it by an inference rule. TAUT: -------------------- x:α ⊢ x:α ABS: -------------------- ⊢ (λx.x):α → α GEN: -------------------- ⊢ (λx.x):∀α(α → α) (1)

The (1) indicates that I’m going to use this fact that we’ve derived later.

Let’s go through this carefully.

We start with a tautology, and therefore need no facts. We logically deduce that if we make the assumption that x:α we can logically deduce that x:α, unsurprisingly.

Now this becomes the fact that is fed into the next rule, ABS. Notice that below the second line we just have a turnstile with nothing to the left of it. This means that we can derive this conclusion from no assumptions whatsoever. Read the ABS rule carefully to make sure that you understand why this is.

This conclusion should make sense: we don’t need any assumptions about the types of any variables to know that the identity lambda is a function from α to α.

And now perhaps you understand why I said last time that the GEN rule is a bit subtle. Here we have `α`

free in the type, but not free in the set of assumptions. (The set of assumptions is empty, so plainly there are no free type variables in it!) It is perfectly valid for us to turn the free type variable into a quantified type variable, and deduce that yes, the identity function is a function from α to α for any α you care to name.

All right, let’s start over. We won’t use any of the facts we’ve just deduced in this next derivation. We’ll start over from scratch.

TAUT: ----------------------------------- i:∀α(α → α) ⊢ i:∀α(α → α) INST: ----------------------------------- i:∀α(α → α) ⊢ i:(α → α) → (α → α) (2)

The tautology says that if `i`

is a function from `α`

to `α`

, then `i`

is a function from `α`

to `α`

. I hope we agree that is true!

Now we can make that more specific if we want to by using the INST rule. Since that is true for any `α`

, it is in particular true for the type `(α → α)`

. These are different alphas than the alphas in `∀α(α → α)`

! We could have just as easily said `i:(β → β) → (β → β)`

, and it would have been more clear.

This is one of the things that is most confusing about academic papers: that they seem to deliberately use the same symbols to mean different things on the same line, like symbols are really expensive or something. I push back in code review when people do that in programming languages, and I wish they would not do it in papers either.

All right, so far we’ve deduced (1) `(λx.x)`

is `∀α(α → α)`

, and (2) if we have a proof that `i`

is `∀α(α → α)`

then we can use `i`

somewhere that an `(α → α) → (α → α)`

is needed. Now let’s make a third deduction starting again from nothing:

TAUT: ----------------------------------- i:∀α(α → α) ⊢ i:∀α(α → α) INST: ----------------------------------- i:∀α(α → α) ⊢ i:α → α (3)

Wait, isn’t that the derivation we just made over again?

Not quite. It’s subtly different. Here we’re saying that `i:∀α(α → α)`

implies that `i:α → α`

. Again these are different alphas!

Now we can apply the COMB rule to facts (2) and (3). Go back and read the COMB rule and make sure you understand why facts (2) and (3) can be used here.

(2) (3) COMB: -------------------------- i:∀α(α → α) ⊢ i i:α → α (4)

This says that if `i`

is a function from `α`

to `α`

, then so is `(i i)`

.

Finally, we can put facts (1) and (4) together using the LET rule to derive the type of a complicated expression:

(1) (4) LET: --------------------------------- ⊢ (let i = (λx.x) in i i):α → α

Here we have an expression that combines the four kinds of expression in Exp: let, lambda, application and identifier. And we derive that *from no assumptions whatsoever*, we can figure out the type of this complex expression; it’s a function from `α`

to `α`

.

This should not be a surprise: if you pass an identity function to itself, you get an identity function, and an identity function is of type `α → α`

. But it is pretty neat that we can make a set of logical deductions that gets us starting from nothing, and ending with this correct conclusion.

But how, you might ask, did we even come up with that derivation? What we need is an *algorithm* that takes an expression and produces a derivation that the expression is of a particular type; such an algorithm is a type inference algorithm!

Next time we’ll discuss the “semantic soundness” of the algorithm, and give a sketch proof.

]]>Suppose the answer is “black is not in check”. In that case there is only one place that the white king can be:

Great. Is this the answer? No. Because now white is in check. Twice. You can’t end a move in check. Therefore black just moved one of the pieces on the board in order to deliver that double check. Which one? It certainly was not the black king. But there is nowhere that the black rook or black bishop could have moved from such that white was not already in check by the other black piece! Since white did not end a move in check, it is impossible for the white king to be on B3.

Therefore the black king is in check after all. So it is black’s move. Some white piece now on the board must have moved in order to deliver that check. Which piece was it? There are only two: the white bishop and the white king. Suppose it was the bishop. Now we have the same problem again: where did the white bishop come from to deliver the check, such that black was not *already* in check? Nowhere, that’s where.

Therefore: black is in check. The check was a *discovered* *check* caused by the white king moving away from B3. There are only two squares that the white king can move to from B3 such that the white king is not moving into check.

Therefore: black is in check, the check was delivered by moving the white king from B3 to either A3 or C3.

Hold on a minute! Didn’t we just argue previously that the white king could not possibly have been on B3? No. We just argued that the white king could not be on B3 *with the board as it is at present because there is no piece that could have delivered the double check.* But the white king could have been on B3 on a board in the past *with an additional black piece on it*, a piece that was just captured by white, and it was *that* piece which delivered the seemingly-impossible double check.

So: at some point in the past the white king was on B3. Black made a move of some piece that ended on either A3 or C3, which double-checked white. White responded by taking that piece and giving a discovered check to black.

What now-taken black piece could possibly deliver that double check by moving to A3 or C3? There is only one possibility, but to see what it is we’ll need to go back a few moves.

Here’s a possible board position a few moves ago; black is to move:

It’s not required that the board have looked like this, but it is possible. Retro enthusiasts refer to this as an “unlocked” position. That is, there is no obvious impediment to legally getting from the starting position to this board position, and there are many ways to do so. From the unlocked position we then have the only possible sequence of moves that ends in the position given in the statement of the puzzle:

Black takes the knight and calls check:

White blocks the check by moving the pawn up two spaces:

Black delivers the “impossible” double check by capturing *en passant*:

And white captures the pawn and calls check:

…leaving the white king on C3.

There is no way to get a black piece that reveals the double check to A3, so this is the only possible solution. Can white force a draw? Who cares! Retros are about figuring out the past, not about what the outcome of the game will be.

If you like this sort of puzzle — and believe me, they get a lot more complicated than that! — Smullyan’s two books on the subject are both delightful. And if you do, a word of advice: pawns can capture en passant, pawns don’t have to promote to a queen, and just because pieces are on their starting squares doesn’t mean they’ve never moved!

]]>Thus logician, philosopher and puzzle-constructor Raymond Smullyan, who died last week at the age of 97.

I started reading Smullyan when I was a teenager; I don’t remember whether I read *This Book Needs No Title* (about philosophy) or *What Is The Name Of This Book?* (puzzles) first, but whichever it was started a lifetime of enjoyment of both. His philosophy was decidedly playful. In one of his books he dispenses with the question “Does a dog have the Buddha-nature?” with *Of course a dog has the Buddha-nature!* You just have to look at some dogs to know that.

His puzzles were deceptively simple and quickly ended up being disguised versions of some very difficult topics in first order logic, combinatory logic, Boolean logic, and so on. Though he was most famous for his “Island of Knights and Knaves” puzzles, where knights can only tell the truth an knaves can only lie, he produced a great many puzzles on other topics. *To Mock A Mockingbird,* in which combinators are thinly disguised as singing birds, is a particular favourite; it changed my understanding of the fundamentals of computer programming.

Among my favourites of all his puzzles though were his two books of absolutely delightful retrograde chess puzzles. Most chess puzzles consider the future: from this position, what should white play next to ensure a mate? Retro puzzles are not concerned with the future, but rather the past; what had to happen in the game in order to arrive at this position?

Here’s my favourite Smullyan retro, which was on the cover of my copy of *Chess Mysteries of the Arabian Nights*:

The white king has been removed from the board; your task is to deduce where it goes. There is only one square where the white king can be such that the position is possible to produce in a legal game of chess. (Note that I said legal, and not sensible!)

Leave your thoughts in the comments and I’ll give the answer later this week.

]]>