Excessive explanation, part twenty

Summing up the story so far: we have a formal system that allows us to start with the types of identifiers in an environment, and provide a series of deductive steps that ends the type of an expression. This is called a “derivation”.

Last time we discussed the “proof” that a “compile-time” series of deductions about the type of an expression is also a guarantee that the expression will always have a value compatible with that type at “run time”. I say “proof” because of course it was a proof by assertion that there exists an inductive proof.

The goal of this paper is to provide an algorithm that takes an expression and a set of typing assumptions for an environment, and produces a derivation that concludes with the type of the expression.

There are two more sketch proofs in this section of the paper, which we’ll briefly review.


We will also require later the following two properties of
the inference system.

Proposition 2. If S is a substitution and A ⊢ e:σ then S A ⊢ e:S σ. 
Moreover if there is a derivation of A ⊢ e:σ : of height n then 
there is also a derivation of S A ⊢ e:S σ of height less [than] or
equal to n.

Proof. By induction on n.

An example might help here. Suppose we have a derivation for

{x:α} ⊢ λy.x:∀β β→α

Now we have a substitution S that means “change all the α to int”. The we can apply the substitution to both sides and still have a true statement:

{x:int} ⊢ λy.x:∀β β→int

Now, remember what that turnstile means. It means that there is actually a derivation: a finite sequence of applications of our six rules, that starts with a bunch of assumptions and ends with “those assumptions entail that this expression has this type”. So if “apply the substitution to both sides” is in fact legal, then there must be a derivation of the substituted type from the substituted assumptions.

The proposition here claims that not only is there such a derivation, but moreover that there is a derivation of equal or shorter length! Substitution apparently can make a derivation shorter, but you never need to make it longer.

The proof-by-assertion here again says that you can prove this by induction, this time by “ordinary” induction on the integer n, the length of the derivation. Let’s sketch that out.

The base case is vacuously true: there are no derivations of length zero, and everything is true of all members of the empty set.

Now, suppose that you have a derivation of A ⊢ e':σ' that is k steps long. Suppose that there is a derivation of S A ⊢ e':S σ' that is k steps long or shorter. And suppose by adding just one more step to our original derivation, we can derive A ⊢ e:σ. Now all we must show is that we can add zero or one additional steps to the derivation of S A ⊢ e':S σ' in order to deduce S A ⊢ e:S σ. There are only six possible steps that could be added, so all we have to show is that each of those possible steps still works under a substitution, or can be omitted entirely.

I’m not going to do that here as it is tedious; noodle around with it for a bit and convince yourself that it’s true.

One more theorem, and then we’re done with this section:

Lemma 1. If σ > σ' and Ax ∪ {x:σ'} ⊢ e:σ0 then also
Ax ∪ {x:σ} ⊢ e:σ0

Incidentally, why have we had two propositions and a lemma?

Propositions, lemmas and theorems are all the same thing: a claim with a formal justification that proves the claim. We use different words to communicate to the reader how important each is.

Propositions are the least interesting; they are typically just there to prove some technical point that is necessary later on in a larger theorem. The proofs are often just hand-waved away as too tedious to mention, as they have been here.

Lemmas are more interesting; they are also in the service of some larger result, but are interesting enough or non-trivial enough that the proof might need to be sketched out in more detail.

Theorems are the major interesting results of the paper, and are all useful in their own right, not just as stepping stones to some larger result.

These lines are blurry of course, since the difference is entirely in emphasizing to the reader what we think is important.

What is this lemma saying? Basically, if we can derive the type of an expression given a small type bound on an identifier, then we can come up with a derivation of the same type for the same expression even if we have a larger bound. For example, suppose we have a derivation for

{y:int, x:int→int} ⊢ (x y):int

Then we can also find a derivation that gives the same result even with a larger type for x:

{y:int, x:∀β β→β} ⊢ (x y):int

Again, remember what the turnstile is saying: that we have a finite sequence of applications of our six rules that gets us from the first set of assumptions to the first type. For this lemma to be true, we must also have a finite sequence of rules that gets us from our new set of assumptions to that same type. The proof sketches out how to create such a sequence:

Proof. We construct a derivation of Ax ∪ {x:σ'} ⊢ e:σ0 from
that of Ax ∪ {x:σ'} ⊢ e:σ0 by substituting each use of TAUT 
for x:σ' with x:σ, followed by an INST step to derive x:σ'. 
Note that GEN steps remain valid since if α occurs free in 
σ then it also occurs free in σ'.

So at most the new derivation gets longer by a finite number of steps, less than or equal to the number of TAUT steps in the original derivation.

And that’s it for section five! Next time we’ll start looking at the algorithm that actually finds the type of an expression. We’ll start by discussing type unification.

Excessive explanation, part nineteen

We’ve just seen that we can use our deductive logic system with its rules TAUT, GEN, and so on, to start from knowing nothing, and end with a correct derivation of the type scheme of a complex expression. But here’s an important question.

We know that A ⊢ e:σ means that there exists a derivation in our deductive system that allows us to conclude that the given expression is of the given type scheme when the given assumptions are true.

And we know that A ⊨ e:σ means that no matter what environment with values we assign to the identifiers named in assumptions A, the value computed for expression e will be a value of type scheme σ.

What we would really like to know is that A ⊢ e:σ logically implies A ⊨ e:σ. That is, if we have a derivation at compile time that the type scheme is correct in our deduction system, that there will be no type violation at runtime! Recall that earlier we defined the “soundness” of a logical deduction as a logically correct derivation where we start with true statements and end with true statements; what we want is to know that our logical system of derivation of types is “sound” with respect to the semantics of the language.

The paper of course provides a detailed proof of this fact:

The following proposition, stating the semantic soundness
of inference, can be proved by induction on e.

Proposition 1 (Soundness of inference). If A ⊢ e:σ then A ⊨ e:σ.

That’s the entire text of the proof; don’t blink or you’ll miss it. It can be proved, and that’s as good as proving it, right? And there’s even a big hint in there: it can be proved by induction.

Wait, what?

Inductive proofs are a proof technique for properties of natural numbers but e is an expression of language Exp, not a number at all!

Let’s consider for a moment the fundamental nature of an inductive proof. Inductive proofs on naturals depend on the following axioms: (by “number” throughout, I mean “non-negative integer”.)

  • Zero is a number.
  • Every number k has a successor number called k + 1.
  • If something is true for zero, and it being true for k implies it is true its successor, then it is true of all numbers.

So the idea is that we prove a property for zero, and then we prove that if the property holds for k, then it also holds for k+1. That implies that it holds for 0+1, which implies that it holds for 1+1, which implies that it holds for 2+1, and so on, so we deduce that it applies to all numbers.

This is the standard form in which inductive reasoning is explained for numbers, but this is not the only form that induction can take; in fact this kind of induction is properly called “simple induction”. Here’s another form of induction:

Prove a property for zero, and then prove that if the property holds for k and every number smaller than k, then it also holds for k+1.

This form of induction is called “strong induction”. Simple and strong induction are equivalent; some proofs are easier to do with simple induction and some are easier with strong, but I hope it is clear that they are just variations on the technique.

Well, we can make the same reasoning for Exp:

  • An identifier is an expression.
  • Every other kind of expression — lambdas, lets, and function calls — is created by combining together a finite number of smaller expressions.
  • Suppose we can prove that a property is true for the base case: identifiers. And suppose that we assume that a property is true for a given expression and all of the finite number of smaller expressions that it is composed of. And suppose we can prove from the assumption that then the property holds on every larger expression formed from this expression. Then the property holds on all expressions.

That’s just a sketch to get the idea across; actually making a formally correct description of this kind of induction, and showing that it is logically justified, would take us very far afield indeed. But suffice to say, we could do so, and are justified in using inductive reasoning on expressions.

Before we go on I want to consider a few more interesting facts about induction. Pause for a moment and think about why induction works. Because I was sloppy before; I left some important facts off the list of stuff you need to make induction work.

Suppose for example I make up a new number. Call it Bob. Bob is not the successor of any number, and it is not zero either. Bob has a successor — Bob+1, obviously. Now suppose we have a proof by induction: we prove that something is true for 0, we prove that if it is true for k then it is true for k+1. Have we proved it for all numbers? Nope! We never proved it for Bob, which is not zero and not the successor of any number!

Induction requires that every number be “reachable” by some finite number of applications of the successor relationship to the smallest number, zero. One of the axioms we need to add to our set is that zero is the only number that is the successor of no number.

The property that expressions in Exp share with integers is: every expression in Exp is either an identifier, or is formed from one or more strictly smaller expressions. There is no “Bob” expression in Exp; there’s no expression that we can’t get to by starting from identifiers and combining them together via lets, lambdas and function calls. And therefore if we can start from identifiers and work our way out, breadth-first, to all the successor expressions, then we will eventually get all expressions. Just as if we start from zero and keep adding one, we eventually get to all numbers.

Induction is really a very powerful technique; it applies in any case where there are irreducible base cases, and where application of successorship rules eventually gets to all possible things.

Back to the paper.

Basically the way the proof would proceed is: we’d start with the base case: does our deductive system guarantee that if A ⊢ x:σ for an identifier x then A ⊨ x:σ? That’s our base case, and it’s pretty obviously true. Suppose we have the assumption A = {x:σ}. By assumption, the value of x at runtime must be from type scheme σ. By the TAUT rule, we can deduce that A ⊢ x:σ. And plainly A ⊨ x:σ is true; by assumption x will have a value from σ! Our base case holds.

Next we would assume that A ⊢ k:σ implies A ⊨ k:σ, and then ask the question “can we show that for every possible successor k’ of k, A ⊢ k':σ' implies A ⊨ k':σ'“? If we can show that, then by the inductive property of expressions, it must be true for all expressions that A ⊢ e:σ implies A ⊨ e:σ.

The actual proof would be a bit trickier than that because of course some of the successors require two expressions, and so we’d have to reason about that carefully.

These proofs are extraordinarily tedious, and you learn very little from them. Thus the proof is omitted from the paper.

There are two more sketch proofs in this section. We’ll cover them next time, and then that will finish off the section on the deductive system.

Excessive explanation, part eighteen

Over the last two episodes we gave the rules of the deductive system. The paper now gives an example of using that deductive system to derive the type of a complicated expression. The way it is laid out on the page makes it difficult to reproduce here and to annotate, so I’ll take some small editorial liberties and “linearize” it a bit.

The following example of a derivation is organised as a tree, in 
which each node follows from those immediately above it by an 
inference rule.

TAUT: --------------------
       x:α ⊢ x:α
ABS:  --------------------
       ⊢ (λx.x):α → α
GEN:  -------------------- 
       ⊢ (λx.x):∀α(α → α)   (1)

The (1) indicates that I’m going to use this fact that we’ve derived later.

Let’s go through this carefully.

We start with a tautology, and therefore need no facts. We logically deduce that if we make the assumption that x:α we can logically deduce that x:α, unsurprisingly.

Now this becomes the fact that is fed into the next rule, ABS. Notice that below the second line we just have a turnstile with nothing to the left of it. This means that we can derive this conclusion from no assumptions whatsoever. Read the ABS rule carefully to make sure that you understand why this is.

This conclusion should make sense: we don’t need any assumptions about the types of any variables to know that the identity lambda is a function from α to α.

And now perhaps you understand why I said last time that the GEN rule is a bit subtle. Here we have α free in the type, but not free in the set of assumptions. (The set of assumptions is empty, so plainly there are no free type variables in it!) It is perfectly valid for us to turn the free type variable into a quantified type variable, and deduce that yes, the identity function is a function from α to α for any α you care to name.

All right, let’s start over. We won’t use any of the facts we’ve just deduced in this next derivation. We’ll start over from scratch.

TAUT: -----------------------------------   
       i:∀α(α → α) ⊢ i:∀α(α → α)  
INST: -----------------------------------
       i:∀α(α → α) ⊢ i:(α → α) → (α → α)   (2)

The tautology says that if i is a function from α to α, then i is a function from α to α. I hope we agree that is true!

Now we can make that more specific if we want to by using the INST rule. Since that is true for any α, it is in particular true for the type (α → α). These are different alphas than the alphas in ∀α(α → α)! We could have just as easily said i:(β → β) → (β → β), and it would have been more clear.

This is one of the things that is most confusing about academic papers: that they seem to deliberately use the same symbols to mean different things on the same line, like symbols are really expensive or something. I push back in code review when people do that in programming languages, and I wish they would not do it in papers either.

All right, so far we’ve deduced (1) (λx.x) is ∀α(α → α), and (2) if we have a proof that i is ∀α(α → α) then we can use i somewhere that an (α → α) → (α → α) is needed. Now let’s make a third deduction starting again from nothing:

TAUT: -----------------------------------   
       i:∀α(α → α) ⊢ i:∀α(α → α)  
INST: -----------------------------------
       i:∀α(α → α) ⊢ i:α → α   (3)

Wait, isn’t that the derivation we just made over again?

Not quite. It’s subtly different. Here we’re saying that i:∀α(α → α) implies that i:α → α. Again these are different alphas!

Now we can apply the COMB rule to facts (2) and (3). Go back and read the COMB rule and make sure you understand why facts (2) and (3) can be used here.

          (2)   (3)
COMB: --------------------------
       i:∀α(α → α) ⊢ i i:α → α   (4)

This says that if i is a function from α to α, then so is (i i).

Finally, we can put facts (1) and (4) together using the LET rule to derive the type of a complicated expression:

 
       (1)   (4)
LET:  --------------------------------- 
       ⊢ (let i = (λx.x) in i i):α → α

Here we have an expression that combines the four kinds of expression in Exp: let, lambda, application and identifier. And we derive that from no assumptions whatsoever, we can figure out the type of this complex expression; it’s a function from α to α.

This should not be a surprise: if you pass an identity function to itself, you get an identity function, and an identity function is of type α → α. But it is pretty neat that we can make a set of logical deductions that gets us starting from nothing, and ending with this correct conclusion.

But how, you might ask, did we even come up with that derivation? What we need is an algorithm that takes an expression and produces a derivation that the expression is of a particular type; such an algorithm is a type inference algorithm!

Next time we’ll discuss the “semantic soundness” of the algorithm, and give a sketch proof.

The chess mystery, solved

Happy St. Valentine’s Day all. To solve the Smullyan retrograde chess puzzle posted yesterday, the first question to ask is “is the black king actually in check, or not?” In a great many retro problems the first task is to determine who is to move, and knowing that one or the other side is in check is the easiest way to do that.

Suppose the answer is “black is not in check”. In that case there is only one place that the white king can be:

screenshot-2017-02-13-11-45-55

Great. Is this the answer? No. Because now white is in check. Twice. You can’t end a move in check. Therefore black just moved one of the pieces on the board in order to deliver that double check. Which one? It certainly was not the black king. But there is nowhere that the black rook or black bishop could have moved from such that white was not already in check by the other black piece! Since white did not end a move in check, it is impossible for the white king to be on B3.

Therefore the black king is in check after all. So it is black’s move. Some white piece now on the board must have moved in order to deliver that check. Which piece was it? There are only two: the white bishop and the white king. Suppose it was the bishop. Now we have the same problem again: where did the white bishop come from to deliver the check, such that black was not already in check? Nowhere, that’s where.

Therefore: black is in check. The check was a discovered check caused by the white king moving away from B3. There are only two squares that the white king can move to from B3 such that the white king is not moving into check.

Therefore: black is in check, the check was delivered by moving the white king from B3 to either A3 or C3.

Hold on a minute! Didn’t we just argue previously that the white king could not possibly have been on B3? No. We just argued that the white king could not be on B3 with the board as it is at present because there is no piece that could have delivered the double check. But the white king could have been on B3 on a board in the past with an additional black piece on it, a piece that was just captured by white, and it was that piece which delivered the seemingly-impossible double check.

So: at some point in the past the white king was on B3. Black made a move of some piece that ended on either A3 or C3, which double-checked white. White responded by taking that piece and giving a discovered check to black.

What now-taken black piece could possibly deliver that double check by moving to A3 or C3? There is only one possibility, but to see what it is we’ll need to go back a few moves.

Here’s a possible board position a few moves ago; black is to move:

screenshot-2017-02-13-10-56-53

It’s not required that the board have looked like this, but it is possible. Retro enthusiasts refer to this as an “unlocked” position. That is, there is no obvious impediment to legally getting from the starting position to this board position, and there are many ways to do so. From the unlocked position we then have the only possible sequence of moves that ends in the position given in the statement of the puzzle:

Black takes the knight and calls check:

screenshot-2017-02-13-10-57-56

White blocks the check by moving the pawn up two spaces:

screenshot-2017-02-13-10-58-56

Black delivers the “impossible” double check by capturing en passant:

screenshot-2017-02-13-10-59-44

And white captures the pawn and calls check:

screenshot-2017-02-13-13-13-49

…leaving the white king on C3.

There is no way to get a black piece that reveals the double check to A3, so this is the only possible solution. Can white force a draw? Who cares! Retros are about figuring out the past, not about what the outcome of the game will be.

If you like this sort of puzzle — and believe me, they get a lot more complicated than that! — Smullyan’s two books on the subject are both delightful. And if you do, a word of advice: pawns can capture en passant, pawns don’t have to promote to a queen, and just because pieces are on their starting squares doesn’t mean they’ve never moved!

Why should I be worried about dying?

It’s not going to happen in my lifetime!

Thus logician, philosopher and puzzle-constructor Raymond Smullyan, who died last week at the age of 97.

I started reading Smullyan when I was a teenager; I don’t remember whether I read This Book Needs No Title (about philosophy) or What Is The Name Of This Book? (puzzles) first, but whichever it was started a lifetime of enjoyment of both. His philosophy was decidedly playful. In one of his books he dispenses with the question “Does a dog have the Buddha-nature?” with Of course a dog has the Buddha-nature! You just have to look at some dogs to know that.

His puzzles were deceptively simple and quickly ended up being disguised versions of some very difficult topics in first order logic, combinatory logic, Boolean logic, and so on. Though he was most famous for his “Island of Knights and Knaves” puzzles, where knights can only tell the truth an knaves can only lie, he produced a great many puzzles on other topics. To Mock A Mockingbird, in which combinators are thinly disguised as singing birds, is a particular favourite; it changed my understanding of the fundamentals of computer programming.

Among my favourites of all his puzzles though were his two books of absolutely delightful retrograde chess puzzles. Most chess puzzles consider the future: from this position, what should white play next to ensure a mate? Retro puzzles are not concerned with the future, but rather the past; what had to happen in the game in order to arrive at this position?

Here’s my favourite Smullyan retro, which was on the cover of my copy of Chess Mysteries of the Arabian Nights:

screenshot-2017-02-13-09-24-33

The white king has been removed from the board; your task is to deduce where it goes. There is only one square where the white king can be such that the position is possible to produce in a legal game of chess. (Note that I said legal, and not sensible!)

Leave your thoughts in the comments and I’ll give the answer later this week.

Excessive explanation, part seventeen

Last time we discussed the syntax for logical deductions that is common in academia: facts above, deductions below, a line between them. We discussed the tautology rule and the instance rule. Today: more rules.

       A ⊢ e:σ   
GEN:  ------------ ( α not free in A )
       A ⊢ e:∀ασ   

This is the “generic” rule: if you have some fact about an expression, then that fact is also true when quantified over all types. For example, if we know that A ⊢ e:int→int then we also know that we can make up a type variable α, and no matter what value we give the type variable, the expression is still of that type.

Exercise: Do you see why α must not be free? Suppose α is free in A; can you find an example of how the GEN rule produces an invalid conclusion if we were to allow this?

Now, you might say that it is obvious that if something is true, then it is also true even when we give any old value to a type variable that is not even involved. And yes, that’s obvious. It’s good that it is obvious! All these rules should be obviously correct.

What I just said was a bit of an “intuition pump”, because it is actually a bit subtle. Do you see why? Just because α is not free in A does not imply that α is not free in σ! So why is it legal to make this quantification over a variable free in σ only if it is not free in A? Ponder that briefly.

If after pondering it was not clear, don’t worry. It is a lot easier to understand these subtleties by looking at a realistic example, which we will do next time on FAIC.

In our next rule we have two facts above the line, which we have not yet seen:

       A ⊢ e:τ' → τ     A ⊢ e':τ'   
COMB: ----------------------------
       A ⊢ (e e'):τ

If we have a function from τ' to τ, and an expression of type τ', then the expression “pass the value to the function” is of type τ.

Just like in C#: if we have a function F that takes a string and returns an int, and e is an expression of type string, then F(e) is an expression of type int. Easy peasy.

If you think we need to have at least one rule for every possible kind of expression in Exp, you’re right! Fortunately there are only four kinds of expressions: identifiers, function calls, lambdas, and let expressions. We’ve already seen the first two, so it should be no surprise that the next rule is for lambdas.

This rule is a little more complicated:

       Ax ∪ { x:τ' } ⊢ e:τ
ABS:  ---------------------
       A ⊢ (λx.e):τ' → τ

Let’s sort this one out carefully.

Suppose we have a set of assumptions A that might or might not contain a judgment for identifier x. If it does, remove it. To that, add a judgment that says that x is of type τ'. And suppose from that set of assumptions we can deduce that expression e is of type τ.

That implies that from assumptions A, we can deduce that λx.e is a function from τ' to τ.

Why do we need to say Ax ∪ { x:τ' } instead of simply A ∪ { x:τ' }? Because we said at the beginning of this chapter that all sets of assumptions would contain at most one judgment for any identifier. But A ∪ { x:τ' } might contain two judgments for x, if A already contains a judgment for x.

Leaving aside the mechanisms of adding and removing judgments from a set of assumptions, I hope the rule is sensible. If knowing that x is of type τ' allows us to deduce that e is of type τ, then λx.e can be used anywhere we need a function from τ' to τ.

Notice that it does not matter whether A contains a judgment for x or not. If it does not, fine. If it does, then who cares? λx.e defines a new variable x that shadows any existing x in the environment.

We’ve seen rules for identifiers, function applications and lambdas. There is only one rule left, which describes how typing works for let expressions. Again, it takes two facts and produces a third.

       A ⊢ e:σ      Ax ∪ { x:σ } ⊢ e':τ
LET:  -----------------------------------
       A ⊢ (let x = e in e'):τ

This is very similar to the rule for lambdas. If e is of type scheme σ, and identifier x being of type scheme σ implies that e' is of type τ, then the type of the corresponding let expression is τ. Again, we need to do a little scut work to make sure that the identifier x is treated properly, but I hope the rule is clear.

These are the only rules of the deductive system. Next time we’ll go through an example of how to use the rules to start with nothing, and from that deduce some facts about identity functions.

Excessive explanation, part sixteen

All right, so far we’ve mostly been talking about jargon and background information. Now we’re actually going to get into the meat of the thing here. What we want is (1) a formal definition of what logical steps we are justified in using to determine what the type scheme of an expression is, when evaluated in a particular environment, and (2) an algorithm that produces a valid sequence of logical steps that ends in a type scheme for the expression.

How are we going to do that? First things first:

5 Type inference

From now on we shall assume that A contains at most one assumption
about each identifier x. Ax stands for removing any assumption about
x from A. 

That notation is a bit vexing, but I hope it is clear.

For assumptions A, expression e and type-scheme σ we write

                          A ⊢ e:σ

if this instance may be derived from the following inference rules:

Note that this is a single turnstile. Last time we used the double turnstile to mean “from a collection of typed identifiers A, we can logically deduce that an expression e has type scheme σ“. It appears that we mean the same thing by the single turnstile, but does it?

The subtle difference is “may be derived from the following inference rules”. The single turnstile means that we actually could produce a valid sequence of particular logical rules that get us to the type judgment.

But we’re getting ahead of ourselves.

I started this series because I get the question quite frequently “I tried to read this paper and I couldn’t understand the notation”. In particular, people who have no experience with the standard notation for logical deductions have a bad reaction to this notation, but really it is very straightforward. The syntax is:

       KNOWN FACTS
RULE: ------------ ( ADDITIONAL NOTES )
       DEDUCTIONS 

So, stuff you know already goes on top. The name of the deduction rule goes on the left. The deduction that this rule allows you to make goes below the line, and any additional notes about the rule go in parens beside it.

The deductions of course are themselves just facts, so we can put a line below the deductions, and make even more deductions, and so on.

In our particular deductive system, all facts and deductions will be of the form A ⊢ e:σ where again, A is a set of identifiers with types, e is an expression, and σ is a type scheme. Remember, the turnstile means “there exists a sequence of logical deductions that justifies this typing judgment”.

We call a sequence of logical deductions of this form a “derivation” of the result.

Let’s start with the easiest rule; a rule so easy you might think it isn’t necessary:

TAUT: ---------- ( x:σ ∈ A )
       A ⊢ x:σ

A “tautology” is a statement that is necessarily true by its form. “All X that are Y are Y” is a true statement no matter what we substitute for X and Y.

This rule has no facts at all above the line; tautologies are necessarily true irrespective of any particular facts. The notes on the side say that for this rule to apply, identifier x must have type scheme σ in assumptions A. Given that, we logically deduce that, surprise, from assumptions A we can deduce that the expression x is an expression of type σ.

So why do we need this rule? Because remember what the single turnstile means. The single turnstile means that we can deduce the type scheme of an identifier by producing a derivation: a sequence of logical deductions drawn from a specific set of rules. It’s not enough to say that obviously, if an assumption is true then it is true. We have to have a rule that says that so that we can use that rule in a valid sequence of rules.

Let’s look at a very slightly harder one.

       A ⊢ e:σ   
INST: ---------- ( σ > σ' )
       A ⊢ e:σ'

The “instance” rule is: if we can logically deduce from assumptions A that e is of type σ, then we can also logically deduce from assumptions A that e is of a smaller type σ'.

For example, suppose we already know that A ⊢ x:∀α(α → α). Then with this rule we can deduce further that A ⊢ x:(int → int). If we have enough evidence to deduce that x is a function from α to α for any type α then we have enough evidence to deduce that it a function from integers to integers.

This might seem a bit weird to you. Isn’t this backwards?

If in C# we deduced that x was of type Mammal, we would not consider that to be evidence for a deduction that it was of a smaller type, Giraffe. We’d consider it to be of a larger type, Animal. However, there’s another way to think of it: in C# if we deduced that x was of type Mammal, we would know that any Giraffe was a valid value for x, but not necessarily any Animal. That’s the kind of inference we’re doing here.

Next time we’ll look at some more of the rules of this deductive system.

Excessive explanation, part fifteen

All right, we are getting through section four of this paper. Today: what is an “environment”?

Now let Env = Id → V be the domain of environments η. 

Just to be clear, this is not a function in the Exp language. This is a formal description of part of the runtime engine of the Exp language. Basically we are saying here that an “environment” is simply the part of the runtime of an implementation of the language that takes identifiers and maps them onto their values. And that just as we’ll use σ for type schemes and α for type variables, we’ll use η for environments.

You’re so accustomed to using environments that they are like the air you breathe; you don’t even think about it. When you’re writing a program in some language and you say x = y + z;, you’re saying “obtain the values associated with identifiers y and z from the environment, add them, and modify the environment so that next time we ask, x has that value”. We can characterize the accesses to y and z as the environment being a function that takes “y” or “z” and returns their values. We can characterize the assignment to x as being the construction of a new environment based on the previous environment.

The semantic function ε : Exp → Env → V is given in [5]. 

Again this is not an function in the Exp language; this is again a formalization of the runtime engine.

Remember that in our notation, a function of two arguments is notated as a function of one argument that returns a function of one argument. So what we’re saying here is that the “semantic function” is a function that takes an expression and an environment and produces a value.

Again, this is making formal the basic operation of a runtime. The runtime has a bunch of expressions and the values of all the variables, and it evaluates the expressions in the context of the environment.

Using it, we wish to attach meaning to assertions of the form 

                           A ⊨ e:σ

where e ∈ Exp and A is a set of assumptions of the form x:σ, x∈Id.

This right here is the mission statement of this paper.

The double-turnstile means “the thing on the right is a logical consequence of the thing on the left”.

What we’re saying here is: we have (on the left) a set of judgments (called “assumptions” here) where all the expressions are identifiers. Say:

x:int
square:int → int
identity:∀β(β → β)

We have on the right an arbitrary expression with a typing judgement, say

λy.identity (square x) : ∀β(β → int)

I hope you agree that if we assume that x is an int, square is a function from int to int, and identity is an identity function, that the lambda given is a function that takes any type and returns an int. The assumptions on the left are sufficient to deduce that the judgment on the right is correct (if the assumptions are correct, of course).

The goal of this paper is to provide an algorithm which, given the facts on the left and the expression on the right, deduces the type schema on the right in a manner that allows us to confidently say that yes, we have sufficient evidence on the left to make this judgment on the right.

(Aside: is it possible to have a function with type scheme ∀β(β → β) that is not an identity function? Ponder this!)


If the assertion is closed, i.e. if A and σ contain no free type
variables, then the sentence is said to hold iff, for every 
environment η, whenever η ⟦x⟧:σ', for each member x:σ' of A,
it follows that ε ⟦e⟧ η:σ. 

OK, we’ve got some jargon and some notation here used without introduction.

First of all, “the assertion” and “the sentence” are being used interchangeably here. We are “asserting” that the assumptions about typed identifiers on the left are sufficient evidence to deduce the type of the expression on the right.

Second, remember when we abused the Exp function notation to say that an environment was a function Id → V? You would think then that we would continue to abuse that notation and say that given an environment η and an identifier x we would say that the value of x was η x. And is what we do, but to call out that x is being used here as an identifier rather than standing in for a value, we “quote” it with funny brackets. So the value of identifier x in environment η is notated η ⟦x⟧.

Similarly for the semantic function ε. It takes two things: an expression e and an environment η, and to emphasize that the expression is an expression of the Exp language, we bracket it as well. So ε ⟦e⟧ η is “the value of expression e when evaluated in environment η”.

So what the heck is this jargon-filled paragraph actually saying? We have a bunch of identifiers with types on one side, and the statement that this implies that expression on the other side has a particular type. The paragraph is saying that this means: choose any possible environment that has those identifiers having some values of the given types. No matter what environment you chose, evaluating the expression in that environment gives a value that matches the expression’s type.

That might be too abstract still. In our example:

x:int, square:int → int, identity:∀β(β → β) ⊨ 
  λy.identity (square x) : ∀β(β → int)

We’re saying that no matter what integer we choose for x, no matter what function from int to int “square” really is, and no matter what function matching its type scheme “identity” really is, we know that evaluating the expression on the right will produce a function from something to int. There is never any combination of values for the environment that would cause that expression to evaluate to something else.

Further, an assertion holds iff all its closed instances hold.

That’s maybe not so clear but fortunately there is an example in the paper:


Thus, to verify the assertion

                x:α, f: ∀β(β → β) ⊨ (f x): α

it is enough to verify it for every monotype μ in place of α.
This example illustrates that free type variables in an assertion
are implicitly quantified over the whole assertion, while explicit
quantification in a type scheme has restricted scope.

What we’re getting at here is that this is not the same as saying

                x:∀α α, f: ∀β(β → β) ⊨ (f x): α

Because that scopes the alpha to just the type of x, but we want that alpha to be still “in scope” on the right hand side of the double turnstile. Basically it is saying that the assertion is logically:

              ∀α  ( x:α, f: ∀β(β → β) ⊨ (f x): α  )

All right, that gets us through section four! The last paragraph in section four lays out what is going to happen in the rest of the paper:


The remainder of this paper proceeds as follows. First we present
an inference system for inferring valid assertions. Next we present 
an algorithm W for computing a type scheme for any expression, under
assumptions A. We then show that W is sound, in the sense that any 
type-scheme it derives is derivable in the inference system. Finally
we show that W is complete, in the sense that any derivable 
type-scheme is an instance of that computed by W.

Next time: A notation for making chains of logical inferences, and another kind of turnstile.