Excessive explanation, part fourteen

All right, finally we are getting to typing judgments, which are the bread and butter of type inference algorithms. We said in the previous episode that the values that are manipulated by a program written in Exp are either values of primitive types, or functions from one value to another, and that this set of values is called V:

To each monotype μ corresponds a subset V , as detailed in [5];
if v∈V is in the subset for μ we write v:μ.

Recall that a monotype is a type with no “generic” type arguments. So for instance, the monotype int has values that are all possible integers. The monotype int → int has values like “the function that takes an integer and doubles it”. This syntax simply means that the value on the left of the colon is an instance of the monotype on the right.

Further we write v:τ if v:μ for every monotype instance μ of τ, 
and we write  v:σ if v:τ for every τ which is a generic instance
of σ.

Let’s consider an example. Suppose we have the identity function λx.x and the type scheme ∀β(β → β). We could make an instance of that type scheme, say int → int. Plainly λx.x is a function that takes ints and returns ints. In fact, for any instance of that type scheme, the identity function fits the bill. So we say that λx.x:∀β(β → β).

Note that if there were even a single instance of the scheme that did not match the function, then we could not make the judgment that the function is a member of that type scheme. For example, suppose we have a function square that takes an int and returns its square. We cannot say that square:∀β(β → β) because square is not a member of, say, (int → int) → (int → int).

Next time: what is an “environment”?

Excessive explanation, part thirteen

So far we’ve gotten through the introduction, the grammar of the Exp language, the grammar of the type description language, and what it means to make substitutions in type schemes. For the next few episodes we’ll consider the semantics of the Exp language.

Having the grammar of the language is enough to be able to parse it, but ultimately computer programs are intended to be executed. Programs manipulate values, and so we should be able to say formally what the rules are for values and how different program elements compute them.

4 Semantics

The semantic domain V for Exp is a complete partial order 
satisfying the following equations up to isomorphism, where 
Bi is a cpo corresponding to primitive type ιi :

V = B0 + B1 + ... + F + W (disjoint sum)
F = V → V (function space)
W = {·} (error element)

HOLY GOODNESS that is a lot of jargon with no explanation.

I’m not going to go into too much detail here as formal semantic analysis using domain theory is not my strong suit. But we can unpack some of this jargon.

First off, by “semantic domain V” we pretty much mean “what are all the possible values that a variable can have?” This might not be as straightforward as you think! Consider C#:

class C<T> 
  int x;
  T y;

It’s pretty clear that the values of variable x can be any of the four billion possible ints. But what are the possible values of y? T is, from the perspective of the compiler, a perfectly legal type for y. We can now come to two incompatible conclusions. First: T can be any type, so y can have any value, including, say, 123. Second: there are no values of type T — please, try to find a running program that has a T stored in y! — so since y is of type T, there can be no values for y.

Plainly at most one of those statements can be true. The second conclusion is unsound but it is tricky to try to figure out where exactly the reasoning goes wrong.

Anyways, my point is that in a language with generics we need to be careful to describe exactly what are the possible values are that can be manipulated by the program. The “semantic domain” that we’re calling “V” here is the set of all values that can be manipulated by the Exp language.

But we say more than just that it is a set; we say that it is a “complete partial order”. What on earth is a cpo, and why do we care?

Long-time readers of this blog already know what a “total ordering” is. It’s a bunch of things, plus a relation on them that takes any two things A and B, and decides consistently whether A is larger than, smaller than, or equal to B. Total orders are necessary for sorting algorithms to work properly. A total order has all kinds of restrictions — things equal to the same are equal to each other, if A is smaller than B and B is smaller than C then A is smaller than C, A and B are never both smaller than each other, and so on. But the easiest way to think about a total order is that given any two elements, they are either equal, or one is bigger than the other.

A partial order relaxes this requirement. In a partial order, given any two elements they are equal, or one is bigger than the other, or they are unordered with respect to each other. “Unordered” and “equal” are very different!

Consider assignment compatibility in C# for example. We can make a partial order on types based on assignment compatibility.

An expression of type int is assignable to a variable of type long, so int is “less than” long. Similarly int is “less than” object. A string is assignable to object, so string is “less than” object. But ints and strings are not assignable to each other, and they certainly are not equal! They are neither bigger, nor smaller, nor equal to each other as far as the “assignment compatibility” relationship is concerned. Assignment compatibility is a partial order.

A set that comes with a partial order is called a “poset”.

A “complete” partial order is a partial order with an additional property which would be somewhat tedious for me to define, so I’m going to skip it. In fact, all of this is a bit of a digression. There are technical reasons why the study of programming language value domains needs complete partial orders. Recall that we mentioned fixed-point analysis as it applied to recursive functions a while back. Suffice to say, the mathematics of analyzing recursive functions, function application, and so on, requires that there be a partial order.

So what are we really saying here? We’re saying that the values that can be manipulated by an Exp program are:

  • Any value of one of the “primitive” types
  • A function that takes a value and returns a value
  • A value representing “this program is an error”.

Frankly it is not 100% clear to me why we need the error element. It is never mentioned again in the paper! Anyone who has more background in domain theory who wants to clue me in here, please leave a comment.

All right, we have come to the not entirely surprising conclusion that the values manipulated in a functional language are primitive values and functions of one parameter.

Next time: typing judgments!

Real Americans

We take a break from a detailed exegesis of a 40-year-old paper for a brief political message.

Freakonomics did a poll to come up with a slogan for this country, and the winner was America: Our Worst Critics Prefer To Stay. I can get behind that. I’ve avoided criticizing my hosts for twenty years, and that ends now.

Yesterday I stood in a room with 42 people from 25 countries on five continents; they all put in the considerable time and effort it takes to become American citizens. Unlike the vast majority of Americans, they chose to be here. And anyone who tells you that they are in any way whatsoever not “real Americans” is selling something that I don’t buy.

Next time: back to type inference and formal semantics!

Excessive explanation, part twelve

Last time we saw how we can make a substitution for a free variable to produce an instance of a type scheme. This time we’ll look at a much more powerful form of substitution.

By contrast a type-scheme σ = ∀α1...αm τ has a "generic instance"
σ' = ∀β1...βn τ' if τ' = [τii]τ for some types τ1,...,τm and 
the βj are not free in σ.

This definitely needs an example.

Suppose we have the identity function λx.x. Its type scheme is ∀α α → α. Now we might want to perform a substitution here of int for α. But we’ve only defined substitution on free variables, and we need to make a substitution on bound variables, to eliminate the bound variable entirely.

This mechanism of “generic instantiation” says: a type scheme consists of zero or more quantifiers and a type. Take just the type and perform a substitution on it, and then put as many quantifiers as you like before the substituted type, provided that the new quantifiers are not turning free variables into bound variables.

Let’s try it. Extract the type from the type scheme: α → α. We perform our substitution [int/α] on that type to get int → int. And then we put zero or more quantifiers on top of that. Let’s put zero quantifiers on top of it. So we have the original type scheme ∀α α → α — that is, for all α, there’s a function from α to α. Now we want a specific generic instance of that, and it is int → int.

This is in a sense the sort of generic instantiation that you’re used to from C#. Notice that we’ve eliminated not just the type variable, but the quantifier on top of the type scheme.

Now let’s look at a more complicated example.

Suppose we have some type scheme for some function, let’s say ∀β∀γ β → γ. And let’s suppose we pass that thing to the identity function. The type scheme of the identity function is ∀α α → α: the quantifier says that this works for any α we care to name, as long as the type provided for α is a type, not a type scheme; remember, so far we’ve only described type substitution by saying that we’re going to replace a free variable with a type, but what we have is a type scheme with two bound type variables: ∀β∀γ β → γ.

How do we deal with this? We’ll do our generic instantiation process again. We have type schemes ∀α α → α and ∀β∀γ β → γ. Take just the type of the first without the quantifiers: α → α. Now make the substitution with the type of the second without the quantifiers: [β → γ / α] to get the type (β → γ) → (β → γ). Then slap a couple of quantifiers on top of that: ∀β∀γ (β → γ) → (β → γ).

We’ve produced an entirely new type scheme, but clearly this type scheme has the same “fundamental structure” as the type scheme for the identity function, even though this type scheme does not have the type variable of the original scheme, or even the same number of type variables; it’s got one more.

Why did we have to say that all of the introduced type variables must be “not free”? Well, let’s see what happens if we violate that constraint. Suppose we have type scheme ∀α β → α. Clearly β is free.

We extract the type: β → α. We perform a substitution [int/α] to get β → int. And then we put a quantifier on top: ∀β β → int. But that turns β from a free variable into a bound variable, so this is not a generic instance of a type scheme with a free variable β. Thus when creating generic instances we are restricted to introduce quantifiers only on non-free variables.

In this case we shall write σ > σ'. 

Conceptually this is pretty straightforward. Given two type schemes, possibly one is a generic instance of the other. If a schema σ’ is a generic instance of a scheme σ then we say that σ’ is smaller than σ. This should make some sense given our examples so far. If you think of schemes as patterns that can be matched, the pattern ∀α α → α is way more general than the pattern int → int, so we say that the latter is “smaller” — less general — than the former.

It is slightly problematic that the authors chose the “greater than” symbol instead of, say, the “greater than or equal” symbol. Why? Well consider ∀α α → α and ∀β β → β. It should not be hard to see that both of these are generic instances of each other, and therefore both are “smaller” than the other, which is silly. Plainly they are equally general!

Note that instantiation acts on free variables, while generic
instantiation acts on bound variables.

This line should now be clear.

It follows that σ > σ' implies S σ > S σ'.

Let’s look at a quick example just to make sure that’s clear. Suppose we have σ = ∀α α → β and σ' = int → β. Clearly σ > σ’. If we now make the substitution of string for β in both, then we have ∀α α → string which is still more general than int → string; clearly the greater-than relationship is preserved by substitution of free variables.

And that’s it for section 3! Next time, we’ll look at the formal semantics of the Exp language.

Excessive explanation, part eleven

Last time we learned what is meant by “free” and “bound” variables both in the context of the Exp language and the type scheme language. So, substitutions:

If S is a substitution of types for type variables, often written
[τ11, ... ,τnn] or [τii] and σ is a type-scheme, then S σ 
is the type-scheme obtained by replacing each free occurrence of 
αi in σ by τi, renaming the generic variables of σ if necessary.

OK, syntactic concerns first. A substitution is written as a bracketed list of pairs, where the first element of the pair is a type (possibly a type variable, but definitely not a type scheme!) and the second is a type variable. We can think of a substitution as a function that takes a type scheme and produces a new type scheme, so (possibly confusingly) we use the same notation as we do for applying a function to an argument in the Exp language.

Let’s look at an example. Suppose we have the type scheme σ = ∀α α → (β → β), and the substitution S = [int/β]. β is free in σ, so S σ is ∀α α → (int → int).

Remember that we said last time that the free type variables were essentially type variables whose values were going to be provided by an outer context. Substitution of types for free variables is essentially a mechanism by which we can supply the value of the type variable.

Why “renaming the generic variables of σ if necessary”? Well suppose we had σ as before but the substitution S = [α/β] and we intended that α to be a different (that is, free) α than the (bound) α in σ. We cannot in this case say that S σ is ∀α α → (α → α) because now every α is bound, which is not what we want. But we can say, hey, we’ll just trivially rename the bound α to γ and now the result of the substitution is ∀γ γ → (α → α) and all is well.

Then S σ is called an "instance" of σ;

Notice that since S σ is itself a type scheme, we are saying that “instance of” is a relation on type schemes: given any two type schemes, we should be able to tell if one is an instance of another, by checking to see if there is any substitution on free variables of one that produces the other, modulo renaming of bound variables.

the notions of substitution and instance extend naturally 
to larger syntactic constructs containing type-schemes.

In later episodes we’re going to have things that contain some number of type schemes, and we want to be able to say “apply this substitution to all of them” and produce a collection of instance type schemes.

Next time: we’ve seen that we can substitute any type for a free variable to get an instance of a schema. But this is insufficiently powerful; we need to be able to make substitutions on bound variables as well, eliminating the bound variable. Next time we’ll see how to do that.

Excessive explanation, part ten

In this short section the paper gives a somewhat formal definition of what we mean by generic type substitution. You are of course familiar with the basic idea in C#: if you have

delegate T MyDelegate<T, U>(T t, List<U> us);

then we can make a substitution of int for T and double for U to get the type MyDelegate<int, double> which represents a method with signature

int MyDelegate(int t, List<double> us);

Let’s see how the paper defines this idea. There are some subtleties that are not readily apparent in C#!

I’m just going to quote the entirety of section 3; it is only two paragraphs. But we’ll need to take this thing apart word by word to understand it.

3 Type instantiation

If S is a substitution of types for type variables, often written
[τ11, ... ,τnn] or [τii] and σ is a type-scheme, then S σ 
is the type-scheme obtained by replacing each free occurrence of 
αi in σ by τi, renaming the generic variables of σ if necessary. 
Then S σ is called an "instance" of σ; the notions of substitution 
and instance extend naturally to larger syntactic constructs 
containing type-schemes.

By contrast a type-scheme σ = ∀α1...αm τ has a "generic instance"
σ' = ∀β1...βn τ' if τ' = [τii]τ for some types τ1,...,τm and 
the βj are not free in σ. In this case we shall write σ > σ'. 
Note that instantiation acts on free variables, while generic
instantiation acts on bound variables. It follows that σ > σ'
implies S σ > S σ'.

Just… wow. We already know what type variables and type schemes are, but now we have type instances somehow distinct from generic type instances, and free variables somehow distinct from bound variables. And an ordering relation on type schemes has just been introduced.

Let’s start by saying what we mean by “free” and “bound” variables in the Exp language, and then say what we mean by free and bound variables in the type language.

Basically, by “bound” we mean that a variable is declared somewhere in the expression, and by “free” we mean it is used but not declared. So for example, in the expression x y, both x and y are free variables. The variable q which does not appear, is not considered free in x y, but nor is it bound.

In the expression let x = q in x y, x is no longer a free variable; the name “x” is associated with a particular syntactic location: the first operand of the “let”. In this larger expression, q and y are free, as both are used but not defined.

Let’s now connect this with the type grammar. Suppose we have expression λx.λy.x. First off, let’s make sure that we understand what this is. It’s a function; all functions take a value and return a value. The outer function takes a value x and returns a function that takes a value y, ignores it, and returns x.

What is the type scheme of this expression? x and y can be any type, and they need not be the same. So let’s make two type variables and quantify both of them. The type scheme of this expression is ∀α ∀β α → (β → α). That is, it’s a function that takes any type alpha, and returns a function that takes any type beta and returns an alpha.

So now maybe it is clear what “bound” means in type schemes. In the type α → (β → α) found inside of that type scheme, both α and β are bound; they have quantifiers that introduce them. But what is “free”? Well, what’s the type scheme of the sub-expression λy.x found inside λx.λy.x? It’s not ∀α ∀β β → α. Why not? Because x cannot take on any type whatsoever; it has to take on the type of the expression that was passed to the outer lambda! The type scheme of the inner lambda is
∀β β → α, and α is a free type variable from the perspective of the inner lambda. A type variable γ which does not appear anywhere would be considered to be neither free nor bound in these expressions.

Maybe looking at this example more explicitly will help. An equivalent but more verbose expression is

let outer = 
  λx.let inner = λy.x in inner
  in outer

If you’re finding this terse syntax hard to read, in C# this would be (ignoring that you can’t use var for lambdas!)

var outer = x => {
  var inner = y => x; 
  return inner;

So what is the type scheme of outer? ∀α ∀β α → (β → α). Both type variables are bound. What is the type scheme of inner? ∀β β → α. α is free, β is bound.

In short, “free” really means “the meaning of this name will be provided by some outer context, not here”, and “bound” means “this name is given its meaning somewhere in here”, regardless of whether we’re talking about identifiers in the Exp language or type variables in the type language.

Apropos of which: if you’re familiar with the C# specification for lambda expressions, in there we refer to the “free” variables of a lambda as the “outer” variables, which I think is a little more clear.

Next time, we’ll actually look at substitution, I promise.

Excessive explanation, part nine

Last time we defined the grammar of our language “Exp”, which consists of identifiers, lambdas, function application and let expressions. This is a programming language. We also need a different language to formally describe types and “type schemes”. The paper says:

Note that types are absent from the language Exp. Assuming a set of
type variables α and of primitive types ι, the syntax of types τ and of
type-schemes σ is given by

τ ::= α | ι | τ → τ
σ ::= τ | ∀α σ

The high order bit here is that the language Exp is just expressions; we have no ability to declare new types in this language. So we assume for the purposes of type inference that there is an existing set of primitives — int, string, bool, whatever — that we can use as the building blocks. What those primitives are does not matter at all because the type inference algorithm never concerns itself with the difference between int and bool. The primitive types are meaningless to the type inference algorithm; all that matters is that we can tell them apart.

The first line in the grammar gives the syntax for types; a type tau can be one of three things.

  • A type can be a generic “type variable” — the T in List<T> in C#, for example — that we will notate alpha, beta, and so on.
  • A type can be one of a set of “primitive” types: int, string, whatever. Doesn’t matter.
  • The only other kind of type is “function which takes a type and returns a type”. That is, a function of one argument, with an argument type and a return type, is a type.

Just as they did with the Exp language, the authors have omitted parentheses from the language of types. Unlike with the Exp language, they neglected to call out that they were doing so.

The second line in the grammar gives the syntax for type schemes.

What’s a type scheme? Recall earlier that we said that a type scheme is a bit like a generic type. More formally, a type scheme is either just a plain type (including possibly a type variable), or “for any type alpha there exists a type such that…”

A type-scheme ∀α1...∀αn τ (which we may write ∀α1...αn τ) has generic
type variables α1...αn.

I’ve never liked “type variables” — I think variables should, you know, vary. The C# spec calls these type parameters, which is much more clear; a parameter is given a value by substituting an argument for it. But that’s the next episode!

Anyways, this sentence is just pointing out that for notational convenience we’ll omit the universal quantifiers in the cases where we have a large number of type variables in a row. But we will always have the quantifiers on the left hand side of the schema. A quantifier never appears in a type, only in a type schema.

Something important to realize here is that types and type schemes are just another syntax for a language, like “Exp”, our expression language. A “type” in this conception is simply a string of symbols that matches the grammar of our type language.

An interesting point to call out here: note that this type system has no generic “classes”. There is no “list of T” in this system. We’ve got primitive types, we’ve got function types, and we’ve got generic functions, and that’s it.

Again, this is because we don’t need anything more complicated than generic functions in order to make type inference interesting. If we can solve the problem of inference in a world with generic functions, we can very easily extend the inference to generic types. So let’s keep it simple and not introduce generic types into our core language.

A monotype μ is a type containing no type variables.

This just introduces a new jargon term, to distinguish generic types from non-generic types.

And that’s it for section two; that was a lot shorter than the introduction.

Next time: as promised, type substitution.

Excessive explanation, part eight

Happy new year all, well enough chit chat, more type inference!

Recall that we are going through the seminal paper on ML type inference line-by-line and explaining all the jargon and symbols. Last year we got through the introduction.

2 The language

Assuming a set Id of identifiers x the language Exp of expressions e
is given by the syntax

e ::= x | e e' | λx.e | let x = e in e'

(where parentheses may be used to avoid ambiguity). 

This is the grammar of the extremely stripped-down version of ML that we’re going to type check. The language is called “Exp”. The ::= is just the separator between the grammar term “e” and all the kinds of things it can be, separated by bars.

So this is saying that an expression (“e”) can be:

  • an identifier — x here is used as a stand-in for any identifier.
  • a function call, where e is the function and e' is the argument, and both are expressions. Remember that in ML we pass arguments to functions by putting the function followed by a space followed by the argument.
  • a function definition, where λ begins the function, x is an identifier that stands in for the parameter, the . separates the parameter from the body, and e is the body of the function, an expression. In conventional OCaml we would say fun instead of λ and so on. (And of course C# programmers now see why we call inline functions “lambda expressions”.) Recall that in ML all functions are functions of one argument; a function of two arguments is just a function of one argument that returns a function of one argument.
  • a let expression; we define a new “variable” called x, assign the value of expression e to it, and then we can use x inside the body of expression e'. Of course the “variables” are not really variables; they only change once. They’re named values. Notice that this implies that expressions are things that have values; we’ll get into that more later.

The authors do not bother to give the grammar for parenthesized expressions; you can work it out easily enough. Similarly they do not say what the parse of something like a b c d is; is that ((a b) c) d or a (b (c d)) or what? None of this is important for the purposes of type inference, so these details are glossed over.

Only the last clause extends the λ-calculus. Indeed for type checking
purposes every let expression could be eliminated (by replacing x by
e everywhere in e'), except for the important consideration that in 
on-line use of ML declarations

let x = e

are allowed, whose scope (e') is the remainder of the on-line
session. As illustrated in the introduction, it must be possible
to assign type-schemes to the identifiers thus declared.

The point here is that let-expressions are just a syntactic convenience; we could eliminate them as we did recursion and “if-then-else”. By “on-line session”, the authors mean the ML REPL. In the REPL, you can simply assign values to variables and those variables stick around for the remainder of the session, as though the remainder of the session was the body of the invisible “in”.

The mapping between a bunch of variables and their values is called an “environment”, and we’ll be talking a lot about environments in future episodes.

So that’s the programming language we’re going to be analyzing. Obviously it is a much smaller language than you’d typically use for line-of-business programming. There are no numbers, no strings, and so on. But if we can do type inference in this language, we can easily extend type inference to more generally useful languages.

Next time: we’ll define an entirely separate little language for describing types.