Bean Machine Retrospective, part 7

How do we write a compiler in a typical general-purpose line-of-business OO programming language such as Python, C#, Java, and so on? Compilers are programs, so we could make the question more general: how do we write programs?

The basic idea common to almost every widely used programming language is to use composition:

Divide the problem into many sub-problems
Write functions that each solve one or more sub-problems
Compose a solution by writing functions that call other functions

The details of how those functions are organized varies from language to language of course; functions are stored in other functions, or in classes, or in modules, or whatever. But ultimately, most programs could be viewed as a composition of functions.

Now, we don’t simply call functions of course. Programming languages also have control flow, whereby they make decisions about what functions to call:

Call foo() if and only if some predicate is true.
Call foo() repeatedly until some predicate is false.
Call foo() but branch to this catch block if it fails
… and so on

We don’t normally think of control flow as a kind of function composition. What if we did? We can use ideas inspired by combinatory logic and functional programming to extract control flow into “combinators” and then use those to concisely build workflows to solve compiler problems.

A “parse tree” or abstract syntax tree (henceforth AST) is a data structure representing a syntactic analysis of a program. Over the next few episodes of this series we’ll explore the question of how a compiler writer might solve a common sub-problem in compiler design: how do we write an AST→ AST function using an approach inspired by combinatory logic?

Since Bean Machine and its compiler are both written in Python, we’ll use the very convenient parse tree types already provided by the Python ast module. It’s very straightforward. Every node in the tree has a type and zero or more labeled children. A child can be a value such as a string or number, or a node, depending on the label.

For example, if we had a statement “x = 2 + 3” then the AST for the right side of the assignment could be constructed like this (assuming that all members of the ast module are brought into scope.)

BinOp(
  left=Num(n=2),
  op=Add(),
  right=Num(n=3))

The expression’s AST is a binary operator; it has three children, left, right and op. The left and right are literal numbers; their child n is the value of that number. You get the idea I’m sure.

Every Python expression, statement, and so on has an AST node, and there are standard implementations of both parsers and unparsers; you can turn text into ASTs, turn ASTs back into text, and compile and run that program.

Next time on FAIC: I’ll describe the patterns/rules/combinators system briefly, and then give some thoughts on what motivated this approach over a more conventional compiler technique such as visitor patterns for rewrites. Then we’ll start looking at examples of patterns and predicates.

Fabulous adventures in coding

Eric Lippert's blog

Bean Machine Retrospective, part 7

2 thoughts on “Bean Machine Retrospective, part 7”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “Bean Machine Retrospective, part 7”

Leave a comment Cancel reply