Excessive explanation, part thirteen

So far we’ve gotten through the introduction, the grammar of the Exp language, the grammar of the type description language, and what it means to make substitutions in type schemes. For the next few episodes we’ll consider the semantics of the Exp language.

Having the grammar of the language is enough to be able to parse it, but ultimately computer programs are intended to be executed. Programs manipulate values, and so we should be able to say formally what the rules are for values and how different program elements compute them.


4 Semantics

The semantic domain V for Exp is a complete partial order 
satisfying the following equations up to isomorphism, where 
Bi is a cpo corresponding to primitive type ιi :

V = B0 + B1 + ... + F + W (disjoint sum)
F = V → V (function space)
W = {·} (error element)

HOLY GOODNESS that is a lot of jargon with no explanation.

I’m not going to go into too much detail here as formal semantic analysis using domain theory is not my strong suit. But we can unpack some of this jargon.

First off, by “semantic domain V” we pretty much mean “what are all the possible values that a variable can have?” This might not be as straightforward as you think! Consider C#:

class C<T> 
{
  int x;
  T y;
}

It’s pretty clear that the values of variable x can be any of the four billion possible ints. But what are the possible values of y? T is, from the perspective of the compiler, a perfectly legal type for y. We can now come to two incompatible conclusions. First: T can be any type, so y can have any value, including, say, 123. Second: there are no values of type T — please, try to find a running program that has a T stored in y! — so since y is of type T, there can be no values for y.

Plainly at most one of those statements can be true. The second conclusion is unsound but it is tricky to try to figure out where exactly the reasoning goes wrong.

Anyways, my point is that in a language with generics we need to be careful to describe exactly what are the possible values are that can be manipulated by the program. The “semantic domain” that we’re calling “V” here is the set of all values that can be manipulated by the Exp language.

But we say more than just that it is a set; we say that it is a “complete partial order”. What on earth is a cpo, and why do we care?

Long-time readers of this blog already know what a “total ordering” is. It’s a bunch of things, plus a relation on them that takes any two things A and B, and decides consistently whether A is larger than, smaller than, or equal to B. Total orders are necessary for sorting algorithms to work properly. A total order has all kinds of restrictions — things equal to the same are equal to each other, if A is smaller than B and B is smaller than C then A is smaller than C, A and B are never both smaller than each other, and so on. But the easiest way to think about a total order is that given any two elements, they are either equal, or one is bigger than the other.

A partial order relaxes this requirement. In a partial order, given any two elements they are equal, or one is bigger than the other, or they are unordered with respect to each other. “Unordered” and “equal” are very different!

Consider assignment compatibility in C# for example. We can make a partial order on types based on assignment compatibility.

An expression of type int is assignable to a variable of type long, so int is “less than” long. Similarly int is “less than” object. A string is assignable to object, so string is “less than” object. But ints and strings are not assignable to each other, and they certainly are not equal! They are neither bigger, nor smaller, nor equal to each other as far as the “assignment compatibility” relationship is concerned. Assignment compatibility is a partial order.

A set that comes with a partial order is called a “poset”.

A “complete” partial order is a partial order with an additional property which would be somewhat tedious for me to define, so I’m going to skip it. In fact, all of this is a bit of a digression. There are technical reasons why the study of programming language value domains needs complete partial orders. Recall that we mentioned fixed-point analysis as it applied to recursive functions a while back. Suffice to say, the mathematics of analyzing recursive functions, function application, and so on, requires that there be a partial order.

So what are we really saying here? We’re saying that the values that can be manipulated by an Exp program are:

Any value of one of the “primitive” types
A function that takes a value and returns a value
A value representing “this program is an error”.

Frankly it is not 100% clear to me why we need the error element. It is never mentioned again in the paper! Anyone who has more background in domain theory who wants to clue me in here, please leave a comment.

All right, we have come to the not entirely surprising conclusion that the values manipulated in a functional language are primitive values and functions of one parameter.

Next time: typing judgments!

5 thoughts on “Excessive explanation, part thirteen”

Joe on January 26, 2017 at 11:49 am said:

Would it be correct to say “completeness” is satisfied because `Object` is “less than” every other type? AKA the “top” type. That might give a clue as to why it’s important (it’s easier to see that you’ll have problems if Object was removed from the language). It doesn’t get into the technical reasons *why* it’s necessary, but might be worth mentioning to give a clue.

I don’t know enough of the theory to say a top type is necessary, but it seems to be related.

Thanks for this series!

Reply ↓
Joker_vD on January 27, 2017 at 7:37 am said:

OH MY GOD

Yeah, right, the article is from the times when operational semantics was still mostly considered a “dirty hack” and that the proper way was to use denotational semantics.

The error element is needed because not all expressions of Exp can be reduced to an element of B0 + B1 + … + F — some of them will get stuck during evaluation, for example, (0 true). One can either denote them as “bottom” (which stands for divergent computations), or as a distinct “wrong” value. The point of a type discipline is to ensure that “Correctly typed expressions don’t go wrong”, and if you identify type errors and divergence, you’ll have to partially solve the halting problem, which is not one usually wants to do in a typechecker. So let’s denote them as “wrong” and make sure that no correctly typed expression has denotation “wrong”.

Reply ↓
- ericlippert on January 27, 2017 at 8:30 am said:
  
  Thanks for that cogent explanation!
  
  Reply ↓
svick on January 30, 2017 at 4:45 pm said:

Assignment compatibility [in C#] is a partial order.

It’s not really relevant to this article, but I don’t think that’s true, at least not if you include custom types with implicit conversions.

If I have classes A, B and C, where A is implicitly convertible to B, B to C and C to A, then transitivity (which is required for a partial order) does not hold for these three types: A is assignable to B, and B is assignable to C, but A is not assignable to C.

Reply ↓
- ericlippert on January 31, 2017 at 4:24 pm said:
  
  That’s a good point, and in fact I just had a conversation with Andy on the C# team about that earlier today. It would have been better to say that built-in reference conversions in C# make a partial order.
  
  Reply ↓