Bean Machine Retrospective, part 8

Before getting into the details of how my combinator-inspired source code transformation system works, I should say first, what is a general overview of the system? and second, why did I build it at all? In my experience, a typical compiler’s AST rewriter does not use a combinator-based approach. Roslyn for example used the visitor pattern almost exclusively to build rewriters, and that’s pretty common.


The basic overview is: we use a functional programming approach to AST transformation.

  • A pattern represents a function from AST to bool. Is this an AST node that we’re looking to transform?
  • A rule represents a “partial” function from AST to AST. By “partial” I mean that when given an AST, a rule is allowed to return the same AST, a different AST, or a failure code indicating that the rule does not apply to the given input.
  • A combinator is a function that takes some number of rules and patterns and returns either a rule or a pattern.

That’s pretty highfalutin, I know. Next time we’ll look at some examples of patterns, and a concise way to represent the simplest ones.


Why choose a combinator approach for Beanstalk, the Bean-Machine-to-BMG compiler? I had a number of reasons:

  • This approach gives a lot of concision for many rules and patterns. Simple code looks simple, is easy to read, and most important, it is easy to reason about its correctness. To be fair, this benefit is somewhat offset by making some of the more complicated rewrites a little tricky to understand, but I tried to limit those.
  • Any portion of the rewriting logic can very easily be independently tested. If you want to test any predicate, any transformation rule, or any combination of them, it’s straightforward to create test cases.
  • Experimenting with different rewriting strategies becomes trivial. For example, we’ll see later that a lot of the AST transformations work “top down” — that is, they transform the root node, and then transform the children, and so on, heading towards the leaves. Is a bottom-up rewriter more efficient, where we transform the leaves and move up to the root? In a conventional compiler, to change an algorithm from top-down to bottom-up might be a considerable rewrite but as we’ll see, changing a top-down traversal to a bottom-up traversal is changing a single function call.
    • Moreover, exotic strategies become just as easily tested. “Apply the rewrites both top-down, and then again bottom-up, and keep going back and forth until a fixpoint is reached” would be easy, if we wanted to try that. The optimization combinator — “try these n different rewrites and then pick the one that minimizes some cost function” — is a combinator I did not end up writing but I certainly intended to.
  • In my system transformation rules are not just functions, they are self-describing objects. This can lead to some interesting debugging benefits. For example, some rules are known to always succeed, and some combinators apply a rule until it fails. Passing a transformation that always succeeds to a combinator that runs it until it fails is obviously going to produce an infinite loop. But the combinator that runs its rule until failure can ask “are you one of the rules that is known to never fail?” and immediately assert, finding the bug early. (You can guess what motivated that feature!)
  • It’s fun! I’ve always been intrigued by the possibilities of “nanopass” architectures for compilers and by rewriting languages such as Stratego. I did not want to take a dependency on Stratego (and it would have been inconvenient to do so from a Python-based compiler) but the core parts of the rewriter are quite straightforward to implement, so I jumped at the chance to finally put these ideas into practice.

Next time on FAIC: We’ll break down the problem into even smaller sub-problems, because we’re computer programmers, that’s what we do. How can we represent predicates about ASTs concisely in Python, and then compose them with combinators?

Bean Machine Retrospective, part 7

How do we write a compiler in a typical general-purpose line-of-business OO programming language such as Python, C#, Java, and so on? Compilers are programs, so we could make the question more general: how do we write programs?

The basic idea common to almost every widely used programming language is to use composition:

  • Divide the problem into many sub-problems
  • Write functions that each solve one or more sub-problems
  • Compose a solution by writing functions that call other functions

The details of how those functions are organized varies from language to language of course; functions are stored in other functions, or in classes, or in modules, or whatever. But ultimately, most programs could be viewed as a composition of functions.

Now, we don’t simply call functions of course. Programming languages also have control flow, whereby they make decisions about what functions to call:

  • Call foo() if and only if some predicate is true.
  • Call foo() repeatedly until some predicate is false.
  • Call foo() but branch to this catch block if it fails
  • … and so on

We don’t normally think of control flow as a kind of function composition. What if we did? We can use ideas inspired by combinatory logic and functional programming to extract control flow into “combinators” and then use those to concisely build workflows to solve compiler problems.


A “parse tree” or abstract syntax tree (henceforth AST) is a data structure representing a syntactic analysis of a program. Over the next few episodes of this series we’ll explore the question of how a compiler writer might solve a common sub-problem in compiler design: how do we write an AST→ AST function using an approach inspired by combinatory logic?

Since Bean Machine and its compiler are both written in Python, we’ll use the very convenient parse tree types already provided by the Python ast module. It’s very straightforward. Every node in the tree has a type and zero or more labeled children. A child can be a value such as a string or number, or a node, depending on the label.

For example, if we had a statement “x = 2 + 3” then the AST for the right side of the assignment could be constructed like this (assuming that all members of the ast module are brought into scope.)

BinOp(
  left=Num(n=2),
  op=Add(),
  right=Num(n=3))

The expression’s AST is a binary operator; it has three children, left, right and op. The left and right are literal numbers; their child n is the value of that number. You get the idea I’m sure.

Every Python expression, statement, and so on has an AST node, and there are standard implementations of both parsers and unparsers; you can turn text into ASTs, turn ASTs back into text, and compile and run that program.


Next time on FAIC: I’ll describe the patterns/rules/combinators system briefly, and then give some thoughts on what motivated this approach over a more conventional compiler technique such as visitor patterns for rewrites. Then we’ll start looking at examples of patterns and predicates.

The names of birds, part 4

The European starling is a lovely looking bird, though territorial, noisy and aggressive up close. Unfortunately, they are very invasive in North America. Most of the hundreds of millions of European starlings now living in the Americas can be found fighting over my suet feeder in winter months.

Last time we said that the kestrel obeys the identity Kxy=x for all birds x and y in the forest. The starling has a slightly more complicated identity: Sxyz=xz(yz).

Again, recall that this is parenthesized ((Sx)y)z=(xz)(yz).

Yes I know I used S for the “successor bird” in part 2; sorry for the confusion, this is a different bird.

If a forest contains both a starling and a kestrel, can you show that there is a bird in the forest which is fond of every bird in the forest? (Note that I’m implying here that there are at least two distinct birds in the forest, and therefore no lonely egocentric kestrel from last episode.)

Give it a try, and then scroll down.

.

.

.

.

.

.

.

Let’s call K to S; the starling calls back another bird. We’ll then call K to it. That calls back some other bird. We’ll then call any bird z in the forest to that bird.

Sxyz=(xz)(yz)
SKKz=(Kz)(Kz)

But Kz is a bird that is fixated on z, so no matter what we call to Kz, we always get z back:

SKKz=(Kz)(Kz)=z

We call SKK the identity bird I, because Iz=z for all z in the forest. Every value is a fixpoint of the identity function bird.

(Aside: Note that SKy names the identity bird for any bird y; we didn’t have to use K.)

Now that we know that any forest with both S and K in it also has I in it, can you prove that the forest also has M in it? (Recall that Mx=xx for all x.) Give it a try and then scroll down.

.

.

.

.

.

.

.

SIIz=(Iz)(Iz)=zz, so SII is M.


All this whimsey is fun of course, but I’m sure you’ve all realized by now that birds are more conventionally called combinators. Combinators are functions that take a single combinator and return a single combinator; forests are sets of combinators.

The study of combinatory logic is interesting for computer scientists. It turns out that there is a relatively straightforward way to take any expression made up of S and K and find an equivalent lambda calculus expression, and vice versa. Any expression in lambda calculus can be expressed as a string of S and K combinators. You can use the S and K combinators to express Booleans, numbers, arbitrary functions on integers, you name it; if a Turing machine can compute it, there’s an S/K expression which can be interpreted as computing the same thing.

There’s a lot more I could say about that, but I’m not going to go into the details.

Oh, and when I added lambdas to C# 3 way back in the day, the first test case I wrote to ensure the type analysis was working was something like:

delegate C C(C x);
...
C i = x=>x;
C m = x=>x(x);
C k = x=>y=>x;
C s = x=>y=>z=>x(z)(y(z));

but I’m not going to discuss that further, at least not right now. We’re already way off track from my series on compiling Bean Machine. Let’s get back to it.

Next time on FAIC: We’ll see how to use an approach inspired by combinators to build parts of a compiler, and why ensuring combinators have fixpoints is important.


If you enjoyed these puzzles, the ones that I showed are some of the introductory puzzles in To Mock A Mockingbird, which is delightful and educational, and gets as far as computability theory and Godel incompleteness. Do pick up a copy!


The names of birds, part 3

In the autumn of last year my friend Joan and I went on a little trip up to the Skagit valley north of Seattle to photograph birds of prey; I managed to get a blurry but recognizable shot of this charming little kestrel looking for field mice. It was easily identifiable in the field because kestrels are one of the very few birds of prey that can briefly hover while hunting, almost like a hummingbird.

Well, back to combinatory logic and some puzzles from To Mock a Mockingbird.

We’ve said that if there are birds x and y such that xy=y, then x is fond of y, and y is a fixpoint of x. Smullyan goes on to say that if there are birds x and y such that xz=y for every z in the forest, then x is not just fond of y, it is fixated on y.

The kestrel, K, is an interesting bird. When you call the name of any bird x to it, it calls back the name of a bird in the forest that is fixated on x. That is, the kestrel obeys the identity Kxy=x for all birds x and y in the forest. (Recall from part one that Kxy means (Kx)y, not K(xy).)

Suppose there is a forest which contains a kestrel and moreover in this particular forest K is fond of itself; that is KK=K. Can you prove that a kestrel that is fond of itself is both (1) extremely egocentric, and (2) lonely? Give it a try, and then scroll down.

.

.

.

.

I need a longer lens! This is at the edge of what my little 70-300 zoom can handle.

.

.

.

(1) A bird which is fond of itself is already probably pretty egocentric. Kx gives the name of a bird which is fixated on x, so KK gives the name of a bird which is fixated on K. But KK=K, so K is the name of a bird that is fixated on K. A bird which is fixated on itself is surely extremely egocentric!

(2) By definition (Kx)y=x for all x and y. Since in this forest K is fixated on K, Kx=K for all x. Therefore Ky=x for all x, y. But again Ky=K for all y so K=x for all x in the forest. Every bird in the forest is the kestrel, and therefore there is only one bird in the forest. No wonder an extremely egocentric kestrel is lonely!


Next time on FAIC: The starling!