Excessive explanation, part eighteen

Over the last two episodes we gave the rules of the deductive system. The paper now gives an example of using that deductive system to derive the type of a complicated expression. The way it is laid out on the page makes it difficult to reproduce here and to annotate, so I’ll take some small editorial liberties and “linearize” it a bit.

The following example of a derivation is organised as a tree, in 
which each node follows from those immediately above it by an 
inference rule.

TAUT: --------------------
       x:α ⊢ x:α
ABS:  --------------------
       ⊢ (λx.x):α → α
GEN:  -------------------- 
       ⊢ (λx.x):∀α(α → α)   (1)

The (1) indicates that I’m going to use this fact that we’ve derived later.

Let’s go through this carefully.

We start with a tautology, and therefore need no facts. We logically deduce that if we make the assumption that x:α we can logically deduce that x:α, unsurprisingly.

Now this becomes the fact that is fed into the next rule, ABS. Notice that below the second line we just have a turnstile with nothing to the left of it. This means that we can derive this conclusion from no assumptions whatsoever. Read the ABS rule carefully to make sure that you understand why this is.

This conclusion should make sense: we don’t need any assumptions about the types of any variables to know that the identity lambda is a function from α to α.

And now perhaps you understand why I said last time that the GEN rule is a bit subtle. Here we have α free in the type, but not free in the set of assumptions. (The set of assumptions is empty, so plainly there are no free type variables in it!) It is perfectly valid for us to turn the free type variable into a quantified type variable, and deduce that yes, the identity function is a function from α to α for any α you care to name.

All right, let’s start over. We won’t use any of the facts we’ve just deduced in this next derivation. We’ll start over from scratch.

TAUT: -----------------------------------   
       i:∀α(α → α) ⊢ i:∀α(α → α)  
INST: -----------------------------------
       i:∀α(α → α) ⊢ i:(α → α) → (α → α)   (2)

The tautology says that if i is a function from α to α, then i is a function from α to α. I hope we agree that is true!

Now we can make that more specific if we want to by using the INST rule. Since that is true for any α, it is in particular true for the type (α → α). These are different alphas than the alphas in ∀α(α → α)! We could have just as easily said i:(β → β) → (β → β), and it would have been more clear.

This is one of the things that is most confusing about academic papers: that they seem to deliberately use the same symbols to mean different things on the same line, like symbols are really expensive or something. I push back in code review when people do that in programming languages, and I wish they would not do it in papers either.

All right, so far we’ve deduced (1) (λx.x) is ∀α(α → α), and (2) if we have a proof that i is ∀α(α → α) then we can use i somewhere that an (α → α) → (α → α) is needed. Now let’s make a third deduction starting again from nothing:

TAUT: -----------------------------------   
       i:∀α(α → α) ⊢ i:∀α(α → α)  
INST: -----------------------------------
       i:∀α(α → α) ⊢ i:α → α   (3)

Wait, isn’t that the derivation we just made over again?

Not quite. It’s subtly different. Here we’re saying that i:∀α(α → α) implies that i:α → α. Again these are different alphas!

Now we can apply the COMB rule to facts (2) and (3). Go back and read the COMB rule and make sure you understand why facts (2) and (3) can be used here.

          (2)   (3)
COMB: --------------------------
       i:∀α(α → α) ⊢ i i:α → α   (4)

This says that if i is a function from α to α, then so is (i i).

Finally, we can put facts (1) and (4) together using the LET rule to derive the type of a complicated expression:

       (1)   (4)
LET:  --------------------------------- 
       ⊢ (let i = (λx.x) in i i):α → α

Here we have an expression that combines the four kinds of expression in Exp: let, lambda, application and identifier. And we derive that from no assumptions whatsoever, we can figure out the type of this complex expression; it’s a function from α to α.

This should not be a surprise: if you pass an identity function to itself, you get an identity function, and an identity function is of type α → α. But it is pretty neat that we can make a set of logical deductions that gets us starting from nothing, and ending with this correct conclusion.

But how, you might ask, did we even come up with that derivation? What we need is an algorithm that takes an expression and produces a derivation that the expression is of a particular type; such an algorithm is a type inference algorithm!

Next time we’ll discuss the “semantic soundness” of the algorithm, and give a sketch proof.


The chess mystery, solved

Happy St. Valentine’s Day all. To solve the Smullyan retrograde chess puzzle posted yesterday, the first question to ask is “is the black king actually in check, or not?” In a great many retro problems the first task is to determine who is to move, and knowing that one or the other side is in check is the easiest way to do that.

Suppose the answer is “black is not in check”. In that case there is only one place that the white king can be:


Great. Is this the answer? No. Because now white is in check. Twice. You can’t end a move in check. Therefore black just moved one of the pieces on the board in order to deliver that double check. Which one? It certainly was not the black king. But there is nowhere that the black rook or black bishop could have moved from such that white was not already in check by the other black piece! Since white did not end a move in check, it is impossible for the white king to be on B3.

Therefore the black king is in check after all. So it is black’s move. Some white piece now on the board must have moved in order to deliver that check. Which piece was it? There are only two: the white bishop and the white king. Suppose it was the bishop. Now we have the same problem again: where did the white bishop come from to deliver the check, such that black was not already in check? Nowhere, that’s where.

Therefore: black is in check. The check was a discovered check caused by the white king moving away from B3. There are only two squares that the white king can move to from B3 such that the white king is not moving into check.

Therefore: black is in check, the check was delivered by moving the white king from B3 to either A3 or C3.

Hold on a minute! Didn’t we just argue previously that the white king could not possibly have been on B3? No. We just argued that the white king could not be on B3 with the board as it is at present because there is no piece that could have delivered the double check. But the white king could have been on B3 on a board in the past with an additional black piece on it, a piece that was just captured by white, and it was that piece which delivered the seemingly-impossible double check.

So: at some point in the past the white king was on B3. Black made a move of some piece that ended on either A3 or C3, which double-checked white. White responded by taking that piece and giving a discovered check to black.

What now-taken black piece could possibly deliver that double check by moving to A3 or C3? There is only one possibility, but to see what it is we’ll need to go back a few moves.

Here’s a possible board position a few moves ago; black is to move:


It’s not required that the board have looked like this, but it is possible. Retro enthusiasts refer to this as an “unlocked” position. That is, there is no obvious impediment to legally getting from the starting position to this board position, and there are many ways to do so. From the unlocked position we then have the only possible sequence of moves that ends in the position given in the statement of the puzzle:

Black takes the knight and calls check:


White blocks the check by moving the pawn up two spaces:


Black delivers the “impossible” double check by capturing en passant:


And white captures the pawn and calls check:


…leaving the white king on C3.

There is no way to get a black piece that reveals the double check to A3, so this is the only possible solution. Can white force a draw? Who cares! Retros are about figuring out the past, not about what the outcome of the game will be.

If you like this sort of puzzle — and believe me, they get a lot more complicated than that! — Smullyan’s two books on the subject are both delightful. And if you do, a word of advice: pawns can capture en passant, pawns don’t have to promote to a queen, and just because pieces are on their starting squares doesn’t mean they’ve never moved!

Why should I be worried about dying?

It’s not going to happen in my lifetime!

Thus logician, philosopher and puzzle-constructor Raymond Smullyan, who died last week at the age of 97.

I started reading Smullyan when I was a teenager; I don’t remember whether I read This Book Needs No Title (about philosophy) or What Is The Name Of This Book? (puzzles) first, but whichever it was started a lifetime of enjoyment of both. His philosophy was decidedly playful. In one of his books he dispenses with the question “Does a dog have the Buddha-nature?” with Of course a dog has the Buddha-nature! You just have to look at some dogs to know that.

His puzzles were deceptively simple and quickly ended up being disguised versions of some very difficult topics in first order logic, combinatory logic, Boolean logic, and so on. Though he was most famous for his “Island of Knights and Knaves” puzzles, where knights can only tell the truth an knaves can only lie, he produced a great many puzzles on other topics. To Mock A Mockingbird, in which combinators are thinly disguised as singing birds, is a particular favourite; it changed my understanding of the fundamentals of computer programming.

Among my favourites of all his puzzles though were his two books of absolutely delightful retrograde chess puzzles. Most chess puzzles consider the future: from this position, what should white play next to ensure a mate? Retro puzzles are not concerned with the future, but rather the past; what had to happen in the game in order to arrive at this position?

Here’s my favourite Smullyan retro, which was on the cover of my copy of Chess Mysteries of the Arabian Nights:


The white king has been removed from the board; your task is to deduce where it goes. There is only one square where the white king can be such that the position is possible to produce in a legal game of chess. (Note that I said legal, and not sensible!)

Leave your thoughts in the comments and I’ll give the answer later this week.

Excessive explanation, part seventeen

Last time we discussed the syntax for logical deductions that is common in academia: facts above, deductions below, a line between them. We discussed the tautology rule and the instance rule. Today: more rules.

       A ⊢ e:σ   
GEN:  ------------ ( α not free in A )
       A ⊢ e:∀ασ   

This is the “generic” rule: if you have some fact about an expression, then that fact is also true when quantified over all types. For example, if we know that A ⊢ e:int→int then we also know that we can make up a type variable α, and no matter what value we give the type variable, the expression is still of that type.

Exercise: Do you see why α must not be free? Suppose α is free in A; can you find an example of how the GEN rule produces an invalid conclusion if we were to allow this?

Now, you might say that it is obvious that if something is true, then it is also true even when we give any old value to a type variable that is not even involved. And yes, that’s obvious. It’s good that it is obvious! All these rules should be obviously correct.

What I just said was a bit of an “intuition pump”, because it is actually a bit subtle. Do you see why? Just because α is not free in A does not imply that α is not free in σ! So why is it legal to make this quantification over a variable free in σ only if it is not free in A? Ponder that briefly.

If after pondering it was not clear, don’t worry. It is a lot easier to understand these subtleties by looking at a realistic example, which we will do next time on FAIC.

In our next rule we have two facts above the line, which we have not yet seen:

       A ⊢ e:τ' → τ     A ⊢ e':τ'   
COMB: ----------------------------
       A ⊢ (e e'):τ

If we have a function from τ' to τ, and an expression of type τ', then the expression “pass the value to the function” is of type τ.

Just like in C#: if we have a function F that takes a string and returns an int, and e is an expression of type string, then F(e) is an expression of type int. Easy peasy.

If you think we need to have at least one rule for every possible kind of expression in Exp, you’re right! Fortunately there are only four kinds of expressions: identifiers, function calls, lambdas, and let expressions. We’ve already seen the first two, so it should be no surprise that the next rule is for lambdas.

This rule is a little more complicated:

       Ax ∪ { x:τ' } ⊢ e:τ
ABS:  ---------------------
       A ⊢ (λx.e):τ' → τ

Let’s sort this one out carefully.

Suppose we have a set of assumptions A that might or might not contain a judgment for identifier x. If it does, remove it. To that, add a judgment that says that x is of type τ'. And suppose from that set of assumptions we can deduce that expression e is of type τ.

That implies that from assumptions A, we can deduce that λx.e is a function from τ' to τ.

Why do we need to say Ax ∪ { x:τ' } instead of simply A ∪ { x:τ' }? Because we said at the beginning of this chapter that all sets of assumptions would contain at most one judgment for any identifier. But A ∪ { x:τ' } might contain two judgments for x, if A already contains a judgment for x.

Leaving aside the mechanisms of adding and removing judgments from a set of assumptions, I hope the rule is sensible. If knowing that x is of type τ' allows us to deduce that e is of type τ, then λx.e can be used anywhere we need a function from τ' to τ.

Notice that it does not matter whether A contains a judgment for x or not. If it does not, fine. If it does, then who cares? λx.e defines a new variable x that shadows any existing x in the environment.

We’ve seen rules for identifiers, function applications and lambdas. There is only one rule left, which describes how typing works for let expressions. Again, it takes two facts and produces a third.

       A ⊢ e:σ      Ax ∪ { x:σ } ⊢ e':τ
LET:  -----------------------------------
       A ⊢ (let x = e in e'):τ

This is very similar to the rule for lambdas. If e is of type scheme σ, and identifier x being of type scheme σ implies that e' is of type τ, then the type of the corresponding let expression is τ. Again, we need to do a little scut work to make sure that the identifier x is treated properly, but I hope the rule is clear.

These are the only rules of the deductive system. Next time we’ll go through an example of how to use the rules to start with nothing, and from that deduce some facts about identity functions.

Excessive explanation, part sixteen

All right, so far we’ve mostly been talking about jargon and background information. Now we’re actually going to get into the meat of the thing here. What we want is (1) a formal definition of what logical steps we are justified in using to determine what the type scheme of an expression is, when evaluated in a particular environment, and (2) an algorithm that produces a valid sequence of logical steps that ends in a type scheme for the expression.

How are we going to do that? First things first:

5 Type inference

From now on we shall assume that A contains at most one assumption
about each identifier x. Ax stands for removing any assumption about
x from A. 

That notation is a bit vexing, but I hope it is clear.

For assumptions A, expression e and type-scheme σ we write

                          A ⊢ e:σ

if this instance may be derived from the following inference rules:

Note that this is a single turnstile. Last time we used the double turnstile to mean “from a collection of typed identifiers A, we can logically deduce that an expression e has type scheme σ“. It appears that we mean the same thing by the single turnstile, but does it?

The subtle difference is “may be derived from the following inference rules”. The single turnstile means that we actually could produce a valid sequence of particular logical rules that get us to the type judgment.

But we’re getting ahead of ourselves.

I started this series because I get the question quite frequently “I tried to read this paper and I couldn’t understand the notation”. In particular, people who have no experience with the standard notation for logical deductions have a bad reaction to this notation, but really it is very straightforward. The syntax is:

RULE: ------------ ( ADDITIONAL NOTES )

So, stuff you know already goes on top. The name of the deduction rule goes on the left. The deduction that this rule allows you to make goes below the line, and any additional notes about the rule go in parens beside it.

The deductions of course are themselves just facts, so we can put a line below the deductions, and make even more deductions, and so on.

In our particular deductive system, all facts and deductions will be of the form A ⊢ e:σ where again, A is a set of identifiers with types, e is an expression, and σ is a type scheme. Remember, the turnstile means “there exists a sequence of logical deductions that justifies this typing judgment”.

We call a sequence of logical deductions of this form a “derivation” of the result.

Let’s start with the easiest rule; a rule so easy you might think it isn’t necessary:

TAUT: ---------- ( x:σ ∈ A )
       A ⊢ x:σ

A “tautology” is a statement that is necessarily true by its form. “All X that are Y are Y” is a true statement no matter what we substitute for X and Y.

This rule has no facts at all above the line; tautologies are necessarily true irrespective of any particular facts. The notes on the side say that for this rule to apply, identifier x must have type scheme σ in assumptions A. Given that, we logically deduce that, surprise, from assumptions A we can deduce that the expression x is an expression of type σ.

So why do we need this rule? Because remember what the single turnstile means. The single turnstile means that we can deduce the type scheme of an identifier by producing a derivation: a sequence of logical deductions drawn from a specific set of rules. It’s not enough to say that obviously, if an assumption is true then it is true. We have to have a rule that says that so that we can use that rule in a valid sequence of rules.

Let’s look at a very slightly harder one.

       A ⊢ e:σ   
INST: ---------- ( σ > σ' )
       A ⊢ e:σ'

The “instance” rule is: if we can logically deduce from assumptions A that e is of type σ, then we can also logically deduce from assumptions A that e is of a smaller type σ'.

For example, suppose we already know that A ⊢ x:∀α(α → α). Then with this rule we can deduce further that A ⊢ x:(int → int). If we have enough evidence to deduce that x is a function from α to α for any type α then we have enough evidence to deduce that it a function from integers to integers.

This might seem a bit weird to you. Isn’t this backwards?

If in C# we deduced that x was of type Mammal, we would not consider that to be evidence for a deduction that it was of a smaller type, Giraffe. We’d consider it to be of a larger type, Animal. However, there’s another way to think of it: in C# if we deduced that x was of type Mammal, we would know that any Giraffe was a valid value for x, but not necessarily any Animal. That’s the kind of inference we’re doing here.

Next time we’ll look at some more of the rules of this deductive system.

Excessive explanation, part fifteen

All right, we are getting through section four of this paper. Today: what is an “environment”?

Now let Env = Id → V be the domain of environments η. 

Just to be clear, this is not a function in the Exp language. This is a formal description of part of the runtime engine of the Exp language. Basically we are saying here that an “environment” is simply the part of the runtime of an implementation of the language that takes identifiers and maps them onto their values. And that just as we’ll use σ for type schemes and α for type variables, we’ll use η for environments.

You’re so accustomed to using environments that they are like the air you breathe; you don’t even think about it. When you’re writing a program in some language and you say x = y + z;, you’re saying “obtain the values associated with identifiers y and z from the environment, add them, and modify the environment so that next time we ask, x has that value”. We can characterize the accesses to y and z as the environment being a function that takes “y” or “z” and returns their values. We can characterize the assignment to x as being the construction of a new environment based on the previous environment.

The semantic function ε : Exp → Env → V is given in [5]. 

Again this is not an function in the Exp language; this is again a formalization of the runtime engine.

Remember that in our notation, a function of two arguments is notated as a function of one argument that returns a function of one argument. So what we’re saying here is that the “semantic function” is a function that takes an expression and an environment and produces a value.

Again, this is making formal the basic operation of a runtime. The runtime has a bunch of expressions and the values of all the variables, and it evaluates the expressions in the context of the environment.

Using it, we wish to attach meaning to assertions of the form 

                           A ⊨ e:σ

where e ∈ Exp and A is a set of assumptions of the form x:σ, x∈Id.

This right here is the mission statement of this paper.

The double-turnstile means “the thing on the right is a logical consequence of the thing on the left”.

What we’re saying here is: we have (on the left) a set of judgments (called “assumptions” here) where all the expressions are identifiers. Say:

square:int → int
identity:∀β(β → β)

We have on the right an arbitrary expression with a typing judgement, say

λy.identity (square x) : ∀β(β → int)

I hope you agree that if we assume that x is an int, square is a function from int to int, and identity is an identity function, that the lambda given is a function that takes any type and returns an int. The assumptions on the left are sufficient to deduce that the judgment on the right is correct (if the assumptions are correct, of course).

The goal of this paper is to provide an algorithm which, given the facts on the left and the expression on the right, deduces the type schema on the right in a manner that allows us to confidently say that yes, we have sufficient evidence on the left to make this judgment on the right.

(Aside: is it possible to have a function with type scheme ∀β(β → β) that is not an identity function? Ponder this!)

If the assertion is closed, i.e. if A and σ contain no free type
variables, then the sentence is said to hold iff, for every 
environment η, whenever η ⟦x⟧:σ', for each member x:σ' of A,
it follows that ε ⟦e⟧ η:σ. 

OK, we’ve got some jargon and some notation here used without introduction.

First of all, “the assertion” and “the sentence” are being used interchangeably here. We are “asserting” that the assumptions about typed identifiers on the left are sufficient evidence to deduce the type of the expression on the right.

Second, remember when we abused the Exp function notation to say that an environment was a function Id → V? You would think then that we would continue to abuse that notation and say that given an environment η and an identifier x we would say that the value of x was η x. And is what we do, but to call out that x is being used here as an identifier rather than standing in for a value, we “quote” it with funny brackets. So the value of identifier x in environment η is notated η ⟦x⟧.

Similarly for the semantic function ε. It takes two things: an expression e and an environment η, and to emphasize that the expression is an expression of the Exp language, we bracket it as well. So ε ⟦e⟧ η is “the value of expression e when evaluated in environment η”.

So what the heck is this jargon-filled paragraph actually saying? We have a bunch of identifiers with types on one side, and the statement that this implies that expression on the other side has a particular type. The paragraph is saying that this means: choose any possible environment that has those identifiers having some values of the given types. No matter what environment you chose, evaluating the expression in that environment gives a value that matches the expression’s type.

That might be too abstract still. In our example:

x:int, square:int → int, identity:∀β(β → β) ⊨ 
  λy.identity (square x) : ∀β(β → int)

We’re saying that no matter what integer we choose for x, no matter what function from int to int “square” really is, and no matter what function matching its type scheme “identity” really is, we know that evaluating the expression on the right will produce a function from something to int. There is never any combination of values for the environment that would cause that expression to evaluate to something else.

Further, an assertion holds iff all its closed instances hold.

That’s maybe not so clear but fortunately there is an example in the paper:

Thus, to verify the assertion

                x:α, f: ∀β(β → β) ⊨ (f x): α

it is enough to verify it for every monotype μ in place of α.
This example illustrates that free type variables in an assertion
are implicitly quantified over the whole assertion, while explicit
quantification in a type scheme has restricted scope.

What we’re getting at here is that this is not the same as saying

                x:∀α α, f: ∀β(β → β) ⊨ (f x): α

Because that scopes the alpha to just the type of x, but we want that alpha to be still “in scope” on the right hand side of the double turnstile. Basically it is saying that the assertion is logically:

              ∀α  ( x:α, f: ∀β(β → β) ⊨ (f x): α  )

All right, that gets us through section four! The last paragraph in section four lays out what is going to happen in the rest of the paper:

The remainder of this paper proceeds as follows. First we present
an inference system for inferring valid assertions. Next we present 
an algorithm W for computing a type scheme for any expression, under
assumptions A. We then show that W is sound, in the sense that any 
type-scheme it derives is derivable in the inference system. Finally
we show that W is complete, in the sense that any derivable 
type-scheme is an instance of that computed by W.

Next time: A notation for making chains of logical inferences, and another kind of turnstile.

Excessive explanation, part fourteen

All right, finally we are getting to typing judgments, which are the bread and butter of type inference algorithms. We said in the previous episode that the values that are manipulated by a program written in Exp are either values of primitive types, or functions from one value to another, and that this set of values is called V:

To each monotype μ corresponds a subset V , as detailed in [5];
if v∈V is in the subset for μ we write v:μ.

Recall that a monotype is a type with no “generic” type arguments. So for instance, the monotype int has values that are all possible integers. The monotype int → int has values like “the function that takes an integer and doubles it”. This syntax simply means that the value on the left of the colon is an instance of the monotype on the right.

Further we write v:τ if v:μ for every monotype instance μ of τ, 
and we write  v:σ if v:τ for every τ which is a generic instance
of σ.

Let’s consider an example. Suppose we have the identity function λx.x and the type scheme ∀β(β → β). We could make an instance of that type scheme, say int → int. Plainly λx.x is a function that takes ints and returns ints. In fact, for any instance of that type scheme, the identity function fits the bill. So we say that λx.x:∀β(β → β).

Note that if there were even a single instance of the scheme that did not match the function, then we could not make the judgment that the function is a member of that type scheme. For example, suppose we have a function square that takes an int and returns its square. We cannot say that square:∀β(β → β) because square is not a member of, say, (int → int) → (int → int).

Next time: what is an “environment”?