Excessive explanation, part seven

From the paper we’re studying:

For simplicity, our definitions and results here are formulated for a 
skeletal language, since their extension to ML is a routine matter. 
For example recursion is omitted since it can be introduced by simply
adding the polymorphic fixed-point operator

fix  : ∀α ((α → α) → α)

The high order bit is that ML has a lot of features, but that’s OK. We can come up with a greatly simplified core language, work out a type inference system for it, and then extend that easily to the whole language. But what is this about recursion? What is a polymorphic fixed-point operator?

What we want to do is establish that the type assignment algorithm does not have to understand recursion; we can in fact remove recursion from the core language entirely, and not change the program behaviour or the type analysis. Let’s see how.

First of all, a “fixed point” of a function f is a value x such that f x equals x. For example:

let f x = x      (* Every possible value is a fixed point *)
let f x = x * x  (* 0 and 1 are the only fixed points *)
let f x = x + 1  (* There are no fixed points *)

OK. Now, consider this crazy-looking but clearly not-recursive function, which looks suspiciously like the recursive factorial function:

let crazy f n = if n < 2 then 1 else n * f (n - 1)

What is the type scheme of this thing? Let’s reason informally.

  • Parameter n must be an integer because it is compared to an integer.
  • n-1 is an integer which is passed to f, so f must be a function that takes an integer.
  • The result of f is multiplied by an integer, so f must return an integer.
  • crazy is a function of two arguments, which means that it is a function of one argument that returns a function of one argument.

In short, the type assignment algorithm must deduce:

crazy : (int → int) → (int → int)

Make sure that makes sense. This thing takes an int → int called f and returns an int → int whose argument is n. That’s just how we express a function of two arguments in this system.

The key question: We have a function crazy that, when considered as a function of one argument, returns the same kind of thing as its argument. Therefore there could be a fixed point. Is there a fixed point of this function?

This thing takes and returns an int → int. So a fixed point of crazy, if it exists, must be an int → int. Let’s suppose it exists, and call it, oh, I don’t know, let’s just pick any old name: factorial.

A fixed point is defined by the equality of two things, but in this case the two things are functions. How do we know that two functions factorial and crazy factorial are equal? That’s straightforward. For any number n we must have that crazy factorial n and factorial n being equal. That’s how we know that crazy factorial and factorial are equal, and therefore factorial is a fixed point of crazy.

Let’s try it and see. I’m going to write a recursive function factorial:

letrec factorial n = if n < 2 then 1 else n * factorial (n - 1)

Now as an exercise, mentally work out some examples and you will soon see that crazy factorial n and factorial n are always equal for any integer n. Therefore factorial is a fixed point of crazy.

Now, let us suppose that magically we have a function fix that can take an α → α for any α and returns a fixed point of that function, if it exists. (The fixed point will of course be of type α.) Now suppose we have this program which we wish to analyze:

letrec factorial n = if n < 2 then 1 else n * factorial (n - 1)

We can trivially translate that into this program:

let factorial = 
  let crazy f n = 
    if n < 2 then 1 else n * f (n - 1) in
  fix crazy

And hey, we have suddenly removed recursion from the language without changing the types of anything! Again, let’s reason informally about the type of this implementation of factorial:

  • We have already argued that crazy has type scheme
    crazy : (int → int) → (int → int)
  • By assumption we have a polymorphic magic function that takes a function and returns its fixed point: fix : ∀α ((α → α) → α)
  • In particular this is true for the alpha that is (int → int), and therefore there exists a function fix : ((int → int) → (int → int)) → (int → int)
  • Since factorial is just fix crazy we deduce that factorial : int → int

And we’re done; we have determined that factorial is a function from int to int without having to analyze any recursion.

The point here is: we never need to write a type analyzer that understands recursion any more deeply than simply rewriting a recursive function into a crazy non-recursive function with an extra argument on top that is a function, and then applying the fix-point combinator to that crazy non-recursive function.

I said that fix was magical. We already know that there are functions that have no fixed point, some have multiple fixed points, and some have a single fixed point. I’m going to leave as an exercise for the reader that the following code is a good enough fixed point combinator to take crazy as an argument and produce a factorial function as its result. I’ll annotate everything with its type; that should help:

(* f is (α → β) → (α → β) *)
(* fix is ((α → β) → (α → β)) → (α → β) *)
let fix f =
 (* r is ρ, a recursively defined type: ρ → α → β *)
 (* (r r) is α → β *)
 (* f (r r) is α → β *)
 (* a is α *)
 (* (f (r r)) a is β *)
 (* bizarre is also ρ *)
 let bizarre r a = (f (r r)) a in 
 (* bizarre bizarre is (α → β) *)
 bizarre bizarre

This implementation of fix doesn’t actually type check in OCaml, by the way — because of that recursive type — but there are ways to get around that. We’ve digressed enough here though. Moving on. The paper said that we don’t need to worry about recursion, and now we know why. Thus far our examples map and factorial have been recursive and included an if-then. Do we need to worry about type analysis of if-then? Continuing with the paper:

and likewise for conditional expressions.

Apparently not. We can remove “if then else” from ML as well without changing the type analysis. How? Suppose again we have:

let crazy f n = if n < 2 then 1 else n * f (n - 1)

Now add a magical function:

ifthenelse : ∀α bool → (dummy → α) → (dummy → α) → α

This magical function takes a bool and two functions, and magically invokes the first function if the bool is true, and the second function if the bool is false, and regardless of which one it invokes, it returns the result. The dummy parameter can be ignored; it’s only there because ML requires that all functions take one argument.

Imagine in C# for instance that there was no if-then-else and no ?: operator. If you had a function:

static T Conditional<T>(bool condition, Func<T> consequence, Func<T> alternative)

that did the right thing then you could implement

    x = b ? c : a;


    x = Conditional(b, ()=>c, ()=>a);

and the type inference engine would produce the same result.

And now we can make two nested functions “consequence” and “alternative” that each take one dummy parameter:

let factorial =
  let crazy f n = 
    let consequence dummy = 1 in 
    let alternative dummy = n * f (n - 1) in
    ifthenelse (n < 2) consequence alternative in
  fix crazy

and the type analysis of crazy will be just the same as before. We don’t actually need if ... then ... else in the type analyzer; it’s an unnecessary syntactic sugar.

The language we analyze will have neither recursion nor conditionals, but now you can see how easy it is to add them to the type analyzer; they’re both equivalent to special function patterns from the point of view of the type analyzer, so all we need to do is say that the rule for analyzing recursion or conditionals is to transform the program into a form with neither, and do the type analysis of the transformed program.

And with that we’ve gotten through the introduction! Next time — next year — we’ll dig into part two: the language.

See you then for more fabulous adventures!

Excessive explanation, part six

Continuing with the question of how to deduce the “type scheme” of

letrec map f s = if null s then nil
                 else cons(f(hd s)) (map f (tl s))

Knowing the type schemes of null, nil, and so on.

Let’s reason informally, just using ordinary logic rather than not some formal system. You’d probably reason something like this:

  • I know that map takes two arguments, f and s, from its declaration.
  • Functions of two arguments in ML are actually functions that take one argument and return another function. Therefore I know that map has to take something of f’s type, and return a function.
  • The returned function must take something of s’s type
  • The returned function must return something of a list type, because there’s a “nil” on one half of the “if” and a “cons” with two arguments on the other. (Obviously “if” in ML is ?: in C#.)
  • I know that s must be some kind of list, because it is passed to null.
  • and so on; can you fill in the rest?

Of course, the whole point of this paper is to develop a formal algorithm for making these sorts of deductions. Which we’ll get to at some point, I’m sure!

Moving on; we’ve been using the phrase “type” and “type scheme” without defining either!

Types are built from type constants (bool, ...) and type variables 
(α, β, ...) using type operators (such as infixed → for functions and
postfixed list for lists); a type-scheme is a type with (possibly)
quantification of type variables at the outermost.

Here we’re just making a bit more formal what we mean by a type.

  • We have a bunch of “built in” types like bool and int and string; they are types by fiat.
  • We have generic type parameters (called here “type variables” though of course they are not variables in the sense of “storage of a value”).
  • And we have operators that act on types; “list” can be thought of as a postfix operator on a type that produces a new type, and the arrow can be thought of as an infix operator that takes two types and produces a function type.
  • “Quantification” is a use of the “for all” operator that introduces a type variable.
  • “Outermost” means that the type variables are introduced as far to the left as possible.

Thus, the main result of this paper is that the type-scheme deduced for
such a declaration (and more generally, for any ML expression) is a 
principal type-scheme, i.e. that any other type-scheme for the 
declaration is a generic instance of it. This is a generalisation of 
Hindley’s result for Combinatory Logic [3].

We’re re-stating the goal of the paper: that the algorithm presented here should find not just a correct type scheme, but the most general possible type scheme. By “generic instance” we mean of course that “string list” is a more specific instance of a scheme like “for any alpha, alpha list”.

Combinatory logic is the study of “combinators” — functions that take functions and return functions. You’re probably heard that this kind of type inference is called “Hindley-Milner” inference, and now you know why.

ML may be contrasted with Algol 68, in which there is no polymorphism, 
and with Russell [2], in which parametric types appear explicitly as 
arguments to polymorphic functions. The generic types of Ada may be 
compared with type-schemes. 

Nothing particularly interesting here; we’re just calling out that ML’s polymorphic type system is just one approach and that you might want to compare it to languages that try other approaches.

For simplicity, our definitions and results here are formulated for a 
skeletal language, since their extension to ML is a routine matter. 
For example recursion is omitted since it can be introduced by simply
adding the polymorphic fixed-point operator

fix  : ∀α ((α → α) → α)

What we’re saying here is that we can present an algorithm for a very simple language, show that it is correct, and then easily extend that algorithm to a more complex language.

Now, you might think that surely it must be difficult to write an algorithm that produces a type for recursive methods like map; there seems to be a regression problem here, where we cannot infer types of a method that calls itself, because we’d need to know the type of the called method in order to work out the calling method, but they are the same method, so, hmm, how do to that?

Fortunately it turns out that you can always transform a recursive method into a non-recursive method, and then apply the type inference to the non-recursive version to deduce the type of the recursive version. We do this by using a fixed-point operator. Therefore the algorithm that will be presented does not have to worry about recursion.

Next time: what the heck is a fixed-point operator?

Excessive explanation, part five

Today, what’s the type of map given the types of its parts?

The type checker will deduce a type-scheme for map from existing 
type-schemes for null, nil, cons, hd and tl; the term type-scheme is 
appropriate since all these objects are polymorphic. In fact from

null : ∀α (α list → bool)
nil  : ∀α (α list)
cons : ∀α (α → (α list → α list))
hd   : ∀α (α list → α)
tl   : ∀α (α list → α list)

will be deduced

map  : ∀α ∀β ((α → β) → (α list → β list)).

One of the difficulties in reading papers like this for people who don’t already have an academic background is that so much notation is used before it is defined, or simply assumed. So let’s unpack this.

First off, polymorphic types (“generics” in C# parlance) are notated as α list. That is, the type argument comes first, followed by the type being parameterized. This has always struck me as weird! You can think of a generic type as a kind of function that takes in type arguments and produces types. In ML we notate a function that takes in an argument and produces a value by putting the argument to the right of the function. But a generic type has its argument to the left. I’ve asked a number of ML experts why this is, and they usually say something about how it is “more natural” to say “string list”, but this is question-begging; it is only “more natural” because that’s what you’re used to seeing! VB solves the problem with “List of String”, which seems far more “natural” to me than “string list”.

Anyways, long story short, for polymorphic types in ML, the type argument comes to the left of the polymorphic type.

So what is this?

null : ∀α (α list → bool)

This is saying that null is a generic function; its type scheme is to the right of the colon. The upside-down A is the universal quantifier, and is read “for all” or “for any”. The arrow means that this is a function that takes a thing of the left type and produces a thing of the right type.

So: null has the type scheme: for any type alpha, there is a function called null that takes a list of alpha and produces a bool. In C# we’d say something like

static bool IsEmpty<T>(List<T> items)

What about

nil  : ∀α (α list)

This thing has no arrow so it isn’t even a function. ML allows polymorphic values; this is saying that there exists an empty list of any element type, and it is called nil.

The head and tail functions should be straightforward; they take a list of alphas and produce either an alpha, (the head) or a list of alphas (the tail).

But what is up with cons? It should take two arguments; an item of type alpha and a list of alphas. And it should produce a list of alphas. Its scheme is:

cons : ∀α (α → (α list → α list))

What is up with this? It looks like it is saying that cons takes an alpha and produces a function. That function in turn takes a list of alphas and produces a list of alphas.

And indeed that is what it does. When you say

let append_hello = cons "hello"

then append_hello is a function that takes a list of strings and returns a list with “hello” appended to it. So

let x = cons "hello" mylist

is the same as

let append_hello = cons "hello"
let x = append_hello mylist

Functions always take one argument; a function of two arguments actually is a function of one argument that returns another function.

So what do we want to deduce as the type scheme for map?

map  : ∀α ∀β ((α → β) → (α list → β list))

The first argument to map must be a function from any type alpha to any type beta. That gives us back a function that takes a list of alphas and returns the list of betas which is the result of applying the given function to each alpha in the list. Somehow the type assignment algorithm needs to deduce this type scheme, given the schemes for all the functions called by the implementation.

Next time: we’ll go through the deductive steps needed to figure out the type of map.

Excessive explanation, part four

Continuing with my excessively detailed explanation of type inference in ML…

After several years of successful use of the language, both in 
LCF and other research, and in teaching to undergraduates, it has 
become important to answer these questions — particularly because
the combination of flexibility (due to polymorphism), robustness (due
to semantic soundness) and detection of errors at compile time has 
proved to be one of the strongest aspects of ML.

ML and its descendants are indeed very pleasant languages. Parametric polymorphism, a sound type system, and good error detection mean that if your program type checks, odds are pretty good it is correct. That said, thirty years later the case is a bit overstated. I’m using OCaml every day and I’m pretty frustrated by the quality of the error messages. In a large complex program it can be hard to determine where the error really is when the type system tells you you’ve made a mistake.

But my purpose here is not to editorialize on the considerable merits of ML, but rather to explicate this paper. So onwards!

The discipline can be well illustrated by a small example. Let
us define in ML the function map, which maps a given function over
a given list — that is

map f [x1; ...; xn] = [f(x1),...,f(xn)]

The map function is familiar to C# programmers by another name. In C# parlance it is:

static IEnumerable<R> Select<A, R>(
  IEnumerable<A> items, 
  Func<A, R> projection)

That is, it takes a sequence of items and a projection function, and it returns a sequence of the items with the projection applied to them. We seek to deduce the same thing about the map function that we will define below: that it takes a sequence of this and a function from this to that, and returns a sequence of that.

The notation used in the paper could use some explanation.

First of all, function application is denoted in ML by stating the function (map) and then the arguments separated by spaces. So f is the first argument, and it is the projection function. The second argument is a list of items x1 through xn. Lists are denoted a couple of different ways in ML; here we are using the syntax “surround the whole thing with square brackets and separate the items with semicolons”.

Plainly the thing on the right hand side is intended to be the list of items after the function has been applied to them, but I must confess that I am mystified as to why the authors of the paper have changed notations here! I would have expected this to say

map f [x1; ...; xn] = [f x1; ...; f xn]

in keeping with the standard ML syntax. Anyone care to hazard a guess? Is this a typo or is there some subtlety here?

The required declaration is

letrec map f s = if null s then nil
                 else cons(f(hd s)) (map f (tl s))

We are defining (“let”) the recursive (“rec”) function map, which takes two arguments, f, a function, and s, a linked list. This is the standard recursive definition; first we consider the base case. null is a function that takes a list and returns true if it is the empty list or false otherwise. If the list is empty then the result of mapping is an empty list, written here as nil. If the list is not empty then it must have a head item, hd s and a tail list tl s. We apply the function to the head item, and recursively solve the problem on the tail, and then append the new head to the new tail using cons.

Now remember, the problem that we’re trying to solve here is “what is the signature of map?” Notice that we have no type information whatsoever in the parameter list. But if we already know the type signatures of the functions used here then we should be able to deduce the type signature.

Next time: given the types of all the parts of map, can we deduce the type of map?

Excessive explanation, part three

Continuing with my excessive explanation of type inference in ML…

Here we answer the question in the affirmative, for the purely 
applicative part of ML. It follows immediately that it is decidable 
whether a program is well-typed, in contrast with the elegant and 
slightly more permissive type discipline of Coppo. [1]

The “applicative” part of a language is the part of a language concerned with the question “what happens when we call a function with some bunch of arguments?” That is, the function is said to be “applied” to its arguments. (Or is the argument applied to the function? It’s a mystery.)

What we’re getting at here is: there’s a lot of stuff in real-world ML that we’re not going to worry about here: data structures and integer literals and pattern matching and all that stuff. The applicative part of the language is its core, and if we can make a sound, general type assignment algorithm for this core of the language, then that’s the interesting part; the rest is just filling in the corners, in some sense.

By “in the affirmative”, we mean that the paper will show two things. First, that the type assignment algorithm really does always find a type assignment if one exists, and produces an error if it does not. It never goes into infinite loops or returns the wrong answer. And second, that if an answer is produced, it is the most general type possible.

By “decidable” we mean that it is possible to write a computer program that takes as its input a syntactically well-formed ML program and always produces as its output either true: the program does not violate the type system, or false, the program does violate the type system. It never hangs, and it never produces the wrong answer. A program that takes some string and produces a Boolean like this is called a “decision algorithm”, and a problem for which a decision algorithm exists is a “decidable problem”.

Not all problems are decidable! The canonical example is the Halting Problem, which is “given a well-formed program, can this program go into an infinite loop?” We know that it is impossible to write a computer program that always answers this question correctly; either such a program is sometimes incorrect, or it sometimes goes into an infinite loop itself.

Note that not all type systems are decidable, and in fact, it was recently proven that the C# type system is not decidable! It is possible to write a C# program where the type checker will either get it wrong or run forever, though it is a pretty weird program. I’ll talk about that in a later blog series. (I had a very, very, VERY small part in this recent proof which is quite hilarious.)

The paper of Coppo mentioned notes that the ML type system does not allow you to represent some functions. For example, the function “take a function as the argument, pass the function to itself, and return the result” does not have a type in the ML type system.

Now, you might think, hey, we can represent that in C#:

delegate D D(D d);
D weird = d => d(d);

Though that is a function that takes a function as the argument, passes it to itself, and returns the result, notice that this is not general! There’s nothing generic here. Coppo points out that there are lots of different types of functions that match the pattern of “it’s legal to pass the function to itself”, and that there is no way to express that pattern in the ML type system.

In the paper, Coppo presents an extension to the ML type system that allows that, but the authors here point out that Coppo’s type system has the down side that it is not actually decidable.

This happens a lot in type systems; you often trade ability to represent a particular kind of odd function in the system for undecidability in extremely weird cases.

Next time, a not-weird case: how can we analyze the type of “map”?

Excessive explanation, part two

We continue with my excessively detailed explanation of the seminal paper on the ML type inference system…

1 Introduction

This paper is concerned with the polymorphic type discipline of ML, 
which is a general purpose functional programming language, although 
it was first introduced as a metalanguage (whence its name) for 
constructing proofs in the LCF proof system. [4]

These bracketed numbers are not footnotes; they are references to other papers. See the original paper’s list of references if you want to follow up on these references.

By “type discipline” we basically mean the same thing as “type system”. That is, there is some way of determining the “type” of the expressions in a programming language, and detecting when the programs are erroneous due to violations of the typing rules.

By “polymorphic” here we mean what a C# programmer would mean by “generic”. There are many kinds of polymorphism in computer programming. “Subtype polymorphism” is what object-oriented programmers typically think of when someone says “polymorphism”:

void M(Animal a) { ... }
Giraffe g = new Giraffe();
M(g); // No problem.

That’s not the kind of polymorphism we’re talking about here. Rather, we’re talking about:

void N<T>(List<T> list) { ... }
List<int> g = ...;
N(g); // No problem.

This is often called “parametric polymorphism”, because the method takes a “type parameter”.

ML has parametric polymorphism, not subtype polymorphism.

A metalanguage is a language used to implement or describe another language. In this case, ML was first created to be the metalanguage for LCF, “Logic for Computable Functions”. The purpose of LCF was to automatically find proofs for mathematical theorems; you could write little programs in ML that described to the LCF system various strategies for trying to get a proof for a particular theorem from a set of premises. But as the paper notes, ML and its descendants are now general-purpose programming languages in their own right, not just implementation details of another language.

The type discipline was studied in [5] where it was shown to be 
semantically sound, in a sense made precise below, but where one 
important question was left open: does the type-checking algorithm — 
or more precisely the type assignment algorithm (since types 
are assigned by the compiler, and need not be mentioned by the 
programmer) — find the most general type possible for every expression 
and declaration?

As the paper notes, we’ll more formally define “sound” later. But the basic idea of soundness comes from logic. A logical deduction is valid if every conclusion follows logically from a premise. But an argument can be valid and come to a false conclusion. For example “All men are immortal; Socrates is a man; therefore Socrates is immortal” is valid, but not sound. The conclusion follows logically, but it follows logically from an incorrect premise. And of course an invalid argument can still reach a true conclusion that does not follow from the premises. A valid deduction with all true premises is sound.

Type systems are essentially systems of logical deduction. We would like to know that if a deduction is made about the type of an expression on the basis of some premises and logical rules, that we can rely on the soundness of those deductions.

The paper draws a bit of a hair-splitting distinction between a type checking algorithm and a type assignment algorithm. A type checking algorithm verifies that a program does not violate any of the rules of the type system, but does not say whether the types were added by the programmer as annotations, or whether the compiler deduced them. In ML, all types are deduced by the compiler using a type assignment algorithm. The question is whether the type assignment algorithm that the authors have in mind finds the most general type for expressions and function declarations.

By “most general” we mean that we want to avoid situations where, say, the compiler deduces “oh, this is a method that takes a list of integers and returns a list of integers”, when it should be deducing “this is a method that takes a list of T and returns a list of T for any T”. The latter is more general.

Why is this important? Well, suppose we deduce that a method takes a list of int when in fact it would be just as correct to say that it takes a list of T. If we deduce the former then a program which passes a list of strings is an error; if we deduce the latter then it is legal. We would like to make deductions that are not just sound, but also that allow the greatest possible number of correct programs to get the stamp of approval from the type system.

Here we answer the question in the affirmative, for the purely 
applicative part of ML. It follows immediately that it is decidable 
whether a program is well-typed, in contrast with the elegant and 
slightly more permissive type discipline of Coppo. [1]

This is a complicated one; we’ll deal with this paragraph in the next episode!