Before getting into the details of how my combinator-inspired source code transformation system works, I should say first, what is a general overview of the system? and second, why did I build it at all? In my experience, a typical compiler’s AST rewriter does not use a combinator-based approach. Roslyn for example used the visitor pattern almost exclusively to build rewriters, and that’s pretty common.
The basic overview is: we use a functional programming approach to AST transformation.
- A pattern represents a function from AST to bool. Is this an AST node that we’re looking to transform?
- A rule represents a “partial” function from AST to AST. By “partial” I mean that when given an AST, a rule is allowed to return the same AST, a different AST, or a failure code indicating that the rule does not apply to the given input.
- A combinator is a function that takes some number of rules and patterns and returns either a rule or a pattern.
That’s pretty highfalutin, I know. Next time we’ll look at some examples of patterns, and a concise way to represent the simplest ones.
Why choose a combinator approach for Beanstalk, the Bean-Machine-to-BMG compiler? I had a number of reasons:
- This approach gives a lot of concision for many rules and patterns. Simple code looks simple, is easy to read, and most important, it is easy to reason about its correctness. To be fair, this benefit is somewhat offset by making some of the more complicated rewrites a little tricky to understand, but I tried to limit those.
- Any portion of the rewriting logic can very easily be independently tested. If you want to test any predicate, any transformation rule, or any combination of them, it’s straightforward to create test cases.
- Experimenting with different rewriting strategies becomes trivial. For example, we’ll see later that a lot of the AST transformations work “top down” — that is, they transform the root node, and then transform the children, and so on, heading towards the leaves. Is a bottom-up rewriter more efficient, where we transform the leaves and move up to the root? In a conventional compiler, to change an algorithm from top-down to bottom-up might be a considerable rewrite but as we’ll see, changing a top-down traversal to a bottom-up traversal is changing a single function call.
- Moreover, exotic strategies become just as easily tested. “Apply the rewrites both top-down, and then again bottom-up, and keep going back and forth until a fixpoint is reached” would be easy, if we wanted to try that. The optimization combinator — “try these n different rewrites and then pick the one that minimizes some cost function” — is a combinator I did not end up writing but I certainly intended to.
- In my system transformation rules are not just functions, they are self-describing objects. This can lead to some interesting debugging benefits. For example, some rules are known to always succeed, and some combinators apply a rule until it fails. Passing a transformation that always succeeds to a combinator that runs it until it fails is obviously going to produce an infinite loop. But the combinator that runs its rule until failure can ask “are you one of the rules that is known to never fail?” and immediately assert, finding the bug early. (You can guess what motivated that feature!)
- It’s fun! I’ve always been intrigued by the possibilities of “nanopass” architectures for compilers and by rewriting languages such as Stratego. I did not want to take a dependency on Stratego (and it would have been inconvenient to do so from a Python-based compiler) but the core parts of the rewriter are quite straightforward to implement, so I jumped at the chance to finally put these ideas into practice.
Next time on FAIC: We’ll break down the problem into even smaller sub-problems, because we’re computer programmers, that’s what we do. How can we represent predicates about ASTs concisely in Python, and then compose them with combinators?