Source code for this episode is here.

Just as a reminder: I am developing my C# version of Stafford’s “QLIFE” algorithm by taking Abrash’s “keep the neighbour counts around” algorithm and gradually adding optimizations. That’s much easier to understand than trying to explain the whole complicated thing at once.

So far, we’ve got the following algorithm:

- Keep up-to-date neighbour counts of all cells in a byte array
- Maintain a list of cells which changed on the previous tick
- On the current tick, examine only the cells which changed on the previous tick and their neighbours. Update the ones that changed, and create a deduplicated change list for the next tick.

However, we are still using a similar technique as we used in our naïve algorithm to ensure that we never write a cell before we need to read its value: we make a complete copy of all the cells on each tick, read from the copy, and write to the original. Making that copy is fast, but it is O(n) in the total number of cells; it seems plausible that we ought to be able to come up with an algorithm that is O(changed cells).

Here’s what we’re going to do.

Recall that we are spending one byte per cell, but are only using five of the eight bits available in a byte: four for the neighbour count, one for the state. **We’re going to use one of the extra bits to store the next state of a cell that is about to change.**

Why do we need that? Surely if we already know what cells are about to change, then the next state is just the opposite of the current state, right? But we need it because we are trying to solve two problems at once here: compute the new state, and *deduplicate the new change list without using a hash table*. Since we are looking at all neighbours of previously-changed cells, we will frequently end up looking at the same cells multiple times, but we do not want the change list to become more and more full of duplicates as time goes on. If the “next state” and the “current state” bits are the same, then we *already* updated the current state, so we can skip adding it to the new change list, and thereby deduplicate it. (As I noted last time, we’ll get to algorithms that use automatically-deduplicated sets in future episodes.)

As always I want the bit twiddling to be isolated into methods of their own and structs to be immutable. Adding accessors for another bit in my Cell struct is straightforward:

private const int next = 5; private const int nextm = 1 << next; public bool Next => (cell & nextm) != 0; public Cell NextAlive() => new Cell((byte)(cell | nextm)); public Cell NextDead() => new Cell((byte)(cell & ~nextm));

Easy peasy. I’m going to change the Step algorithm but everything else — BecomeAlive and BecomeDead in particular — stays exactly the same. As does most of Step:

public void Step() {

We’re going to create a new, temporary list of the cells that are about to change on this tick. This list is allowed to have duplicates.

var currentChanges = new List<(int, int)>();

There is a line of code conspicuous by its absence here. We are not cloning the array!

Once again we loop over the cells which changed last time and their neighbours; this is all the same:

foreach ((int cx, int cy) in changes) { int minx = Max(cx - 1, 1); int maxx = Min(cx + 2, width - 1); int miny = Max(cy - 1, 1); int maxy = Min(cy + 2, height - 1); for (int y = miny; y < maxy; y += 1) { for (int x = minx; x < maxx; x += 1) {

Once again we have the Life condition that you are now familiar with:

Cell cell = cells[x, y]; int count = cell.Count; bool state = cell.State; bool newState = count == 3 | count == 2 & state;

Is this cell going to change? If so, record the new state in bit 5 and add it to the current changes list:

if (state & !newState) { currentChanges.Add((x, y)); cells[x, y] = cell.NextDead(); } else if (!state & newState) { currentChanges.Add((x, y)); cells[x, y] = cell.NextAlive(); } } } }

Again, yes, I could do the mutation of the bit in-places since we have a collection of variables, but I can’t bring myself to mutate a value type.

We’re done our first pass; the list of previous changes is now useless to us:

changes.Clear();

The second pass is much simpler than the first. For all the cells that need changing, use idempotent helper functions from last time to record the new neighbour counts and update the change list for the next tick:

foreach ((int x, int y) in currentChanges) { if (cells[x, y].Next) BecomeAlive(x, y); else BecomeDead(x, y); } }

And we’re done.

If you look at the asymptotic performance, plainly it is O(changed cells), and not O(total number of cells), which is great! This means that we could have extremely large boards with lots of empty space or still lifes, and we only pay a per-tick time price for the evolving or oscillating cells that change.

Our “static” memory consumption, for the array, is still O(total cells) of course, but our dynamic burden on the garbage collector is also O(changed cells) per tick, which seems like it ought to be a win.

What about our actual performance of computing acorn for 5000 cycles on an 8-quad?

Algorithm time(ms) ticks size(quad) megacells/s Naïve (Optimized): 4000 5K 8 82 Scholes 3250 5K 8 101 Frijters SIMD 1000 5M 4 1200 Abrash 550 5K 8 596 Abrash w/ changes 190 5K 8 1725 Abrash, O(c) 240 5K 8 1365

I was slightly surprised to discover that it is around 20% slower! As I have pointed out a number of times, copying a 64K array of bytes is astonishingly fast. That’s the thing to always remember about asymptotic performance: it is about how the performance cost changes as the problem becomes large. It would be interesting to do some scaling experiments and discover when the cost of copying the array becomes the dominating cost, but I’m not going to; if anyone wants to do the experiment please do and report back.

**Update**: Reader jaloopa has done the experiment on their machine; when bumping up from an 8-quad to a 10-quad — so, just 16 times bigger — the O(c) algorithm is **ten times faster** than the O(n) algorithm! So this is actually a big win once the board sizes get significantly larger than 64KB. I was closer to the break-even point than I thought.

**Next time on FAIC:** I’ve been talking a lot about Life algorithms but very little about Life itself. Let’s digress for an episode or two and explore some interesting basic patterns.

**After that:** We are now using six bits per cell to store the neighbour count, current state and next state. If we can get that back down to five bits per cell then we can fit three cells into a two-byte short. That’s slightly more memory efficient, but at a cost of greatly increasing the amount of bit twiddling we need to do. This seems like a performance-killing change; will it eventually pay for itself by enabling further optimizations, or is it a big waste of effort? We’ll find out!