# Anti-unification, part 1

Last time on FAIC we noted that there was a simple, recursive, linear-time first-order unification algorithm. We didn’t give the algorithm, but hopefully you see how it would go. The point is, we start with two expressions that contain some holes, and we deduce hole-free expressions that, when filled into the holes, cause the two expressions to become identical.

I got into this series of posts because I wanted to talk about anti-unification, and it is hard to say what anti-unification is if you don’t know what unification is. Anti-unification is kinda, sorta, but not exactly the opposite problem. The anti-unification problem is:

Given two input expressions (often, but not necessarily without holes), call them `s` and `t`, find a generalizing expression `g` that does have holes, and two substitutions; the first substitution makes the result expression equal to the first input, and the second substitution makes it equal to the second input.

Let’s look at an example. Suppose are inputs are

```s is  (1 :: 2) :: (1 :: 2) :: nil
t is  3 :: 3 :: nil```

If you don’t like the `::` operator, we could equivalently think of these as function calls:

```s is  cons(cons(1, 2), cons(cons(1, 2), nil))
t is  cons(3, cons(3, nil))```

Or, if you prefer, as trees:

```s is     cons             t is       cons
/      \                     /    \
cons      cons                 3     cons
/    \     /   \                     /    \
1      2  cons   nil                 3      nil
/    \
1      2```

and the question is: what is an expression with holes such that there are substitutions for the holes that make both s and t? Plainly the generalizing expression is

`cons(h0, cons(h0, nil)`

and the substitutions are `cons(1, 2) / h0` to make `s` and `3 / h0` to make `t`. (Recall that our standard notation for substitutions is to separate the value being substituted and the hole to substitute by a slash.)

You might have noticed that the unification problem might have no solution; there might be no possible set of substitutions for all the variables that make the statements all true. But the anti-unification problem always has at least one solution: the generalized expression `h0` always works, and of course the substitutions are `s/h0` and `t/h0`.

Thus we need to make the anti-unification problem a little bit harder for it to be interesting: we want the most specific generalization, not just any generalization.

It was pretty clear that unification on equations could be generalized to multiple equations. Similarly, I hope it is clear that anti-unification on two expressions can be generalized to any number of expressions; we want the generalizing expression and a substitution for each input expression.

And of course just as there was higher-order unification for unification problems involving lambdas, and unification modulo theory for problems involving arithmetic or other mathematical ideas, there are similar variations on anti-unification. For our purposes we’ll consider only first-order anti-unifiction, which is the easy one.

Why is anti-unification useful? I’ll get into that in a later post. Now that we know what first-order anti-unification is, can we devise and implement an algorithm that takes in expressions and gives us the most specific anti-unifying expression?

Next time on FAIC: we’ll sketch out the algorithm.

## 3 thoughts on “Anti-unification, part 1”

1. I’m looking forward to the rest of this series. I see that it can be useful with refactoring to what I’ve heard called the “donut hole” pattern of functional programming, where the common surrounding code is extrapolated to a single method, and the changing function at the core is passed in as a parameter.

• That’s the direction I’m heading here, but the actual application will be slightly different than refactoring!