In this short section the paper gives a somewhat formal definition of what we mean by generic type substitution. You are of course familiar with the basic idea in C#: if you have

delegate T MyDelegate<T, U>(T t, List<U> us);

then we can make a substitution of `int`

for `T`

and `double`

for `U`

to get the type `MyDelegate<int, double>`

which represents a method with signature

int MyDelegate(int t, List<double> us);

Let’s see how the paper defines this idea. There are some subtleties that are not readily apparent in C#!

I’m just going to quote the entirety of section 3; it is only two paragraphs. But we’ll need to take this thing apart word by word to understand it.

3 Type instantiation If S is a substitution of types for type variables, often written [τ_{1}/α_{1}, ... ,τ_{n}/α_{n}] or [τ_{i}/α_{i}] and σ is a type-scheme, then S σ is the type-scheme obtained by replacing each free occurrence of α_{i}in σ by τ_{i}, renaming the generic variables of σ if necessary. Then S σ is called an "instance" of σ; the notions of substitution and instance extend naturally to larger syntactic constructs containing type-schemes. By contrast a type-scheme σ = ∀α_{1}...α_{m}τ has a "generic instance" σ' = ∀β_{1}...β_{n}τ' if τ' = [τ_{i}/α_{i}]τ for some types τ_{1},...,τ_{m}and the β_{j}are not free in σ. In this case we shall write σ > σ'. Note that instantiation acts on free variables, while generic instantiation acts on bound variables. It follows that σ > σ' implies S σ > S σ'.

Just… wow. We already know what type variables and type schemes are, but now we have type instances somehow distinct from generic type instances, and free variables somehow distinct from bound variables. And an ordering relation on type schemes has just been introduced.

Let’s start by saying what we mean by “free” and “bound” variables in the Exp language, and then say what we mean by free and bound variables in the type language.

Basically, by “bound” we mean that a variable is declared somewhere in the expression, and by “free” we mean it is used but not declared. So for example, in the expression `x y`

, both `x`

and `y`

are free variables. The variable `q`

which does not appear, is not considered free in `x y`

, but nor is it bound.

In the expression `let x = q in x y`

, `x`

is no longer a free variable; the name “x” is associated with a particular syntactic location: the first operand of the “let”. In this larger expression, `q`

and `y`

are free, as both are used but not defined.

Let’s now connect this with the type grammar. Suppose we have expression `λx.λy.x`

. First off, let’s make sure that we understand what this is. It’s a function; all functions take a value and return a value. The outer function takes a value `x`

and returns a function that takes a value `y`

, ignores it, and returns `x`

.

What is the type scheme of this expression? `x`

and `y`

can be any type, and they need not be the same. So let’s make two type variables and quantify both of them. The type scheme of this expression is `∀α ∀β α → (β → α)`

. That is, it’s a function that takes any type alpha, and returns a function that takes any type beta and returns an alpha.

So now maybe it is clear what “bound” means in type schemes. In the type `α → (β → α)`

found inside of that type scheme, both α and β are bound; they have quantifiers that introduce them. But what is “free”? Well, what’s the type scheme of the sub-expression `λy.x`

found inside `λx.λy.x`

? It’s not `∀α ∀β β → α`

. Why not? Because `x`

cannot take on any type whatsoever; it has to take on the type of the expression that was passed to the outer lambda! The type scheme of the inner lambda is

`∀β β → α`

, and α is a free type variable from the perspective of the inner lambda. A type variable γ which does not appear anywhere would be considered to be neither free nor bound in these expressions.

Maybe looking at this example more explicitly will help. An equivalent but more verbose expression is

let outer = λx.let inner = λy.x in inner in outer

If you’re finding this terse syntax hard to read, in C# this would be (ignoring that you can’t use var for lambdas!)

var outer = x => { var inner = y => x; return inner; };

So what is the type scheme of outer? `∀α ∀β α → (β → α)`

. Both type variables are bound. What is the type scheme of inner? `∀β β → α`

. α is free, β is bound.

In short, “free” really means “the meaning of this name will be provided by some outer context, not here”, and “bound” means “this name is given its meaning somewhere in here”, regardless of whether we’re talking about identifiers in the Exp language or type variables in the type language.

Apropos of which: if you’re familiar with the C# specification for lambda expressions, in there we refer to the “free” variables of a lambda as the “outer” variables, which I think is a little more clear.

Next time, we’ll actually look at substitution, I promise.

Pingback: Unification, part 1 | Fabulous adventures in coding