Life, part 16

Source code for this episode is here.

We are continuing with our project to gradually morph Abrash’s “remember the living neighbour counts” algorithm into Stafford’s algorithm. I’m going to start today by adding two very small bit-twiddling optimizations to the Triplet wrapper struct I introduced in the previous episode.

The first optimization is: I’m going to need to update neighbour counts, but remember, we’re going to be updating a short with three neighbour counts in it. Let’s look at an example of a 3×3 grid represented by three shorts, and assume that every other cell in the larger grid is dead.

I’ll give the current state as “A” for alive, “D” for dead, and the counts are “raw” counts; that is, the left-hand living neighbour counts are not including the state of the middle cell, and the middle neighbour counts are not including the left or right cells, and so on.

The scenario is that an L-shaped triomino is about to become a block:

        Before                  After
        A-1  A-1  D-0           A-2  A-2  D-1
        A-2  D-2  D-1   ==>     A-2  A-2  D-1
        D-1  D-1  D-0           D-2  D-2  D-1

What has to happen? The middle triplet’s living neighbour counts do not need to change. But when the middle-of-the-middle triplet cell becomes alive, we need to increase all three living neighbour counts by one in both the upper and lower triplet.

Suppose the middle triplet is in a rectangular array of triplets at x, y. We do not want to have to write for this scenario:

n = triplet[x, y - 1];
triplets[x, y - 1] = n.SetLeftCountRaw(n.LeftCountRaw+1)
s = triplet[x, y + 1];
... oh the boredom ...

That’s both tedious and unnecessarily slow because of all the bit twiddling. Instead I’m going to write 15 helper functions as follows:

private const int lcountone = 1 << lcount;
private const int mcountone = 1 << mcount;
private const int rcountone = 1 << rcount;
public Triplet UUP() => new Triplet(rcountone + triplet);
public Triplet UUM() => new Triplet(-rcountone + triplet);
public Triplet UPU() => new Triplet(mcountone + triplet);
public Triplet UPP() => new Triplet(mcountone + rcountone + triplet);
public Triplet UMU() => new Triplet(-mcountone + triplet);
public Triplet UMM() => new Triplet(-mcountone - rcountone + triplet);
public Triplet PUU() => new Triplet(lcountone + triplet);
public Triplet PUM() => new Triplet(lcountone - rcountone + triplet);
public Triplet PPU() => new Triplet(lcountone + mcountone + triplet);
public Triplet PPP() => new Triplet(lcountone + mcountone + rcountone + triplet);
public Triplet MUU() => new Triplet(-lcountone + triplet);
public Triplet MUP() => new Triplet(-lcountone + rcountone + triplet);
public Triplet MMU() => new Triplet(-lcountone - mcountone + triplet);
public Triplet MMM() => new Triplet(-lcountone - mcountone - rcountone + triplet);

To add one to a three-bit integer embedded inside a short, all we need to do is add one shifted to the right position in the short! We do not need to extract the bits, put them in an integer, do the arithmetic, and then put them back.

(Also, fun bonus nano-optimization: since I put the constants on the left, the ones with multiple operations will get folded at compile time.)

The naming is of course P for plus, M for minus, U for unchanged; the operation I described above would be “PPP”.

The second optimization is: I want to be able to quickly answer the question “is anything going to change in this triplet on this tick?” What we’re going to do, recall, is set three “next state” bits, so the answer to my question is: are the next-state bits as an integer equal to the current-state bits as an integer? If the answer is yes, then there was no change.

private const int currentm = lcm | mcm | rcm;
private const int nextm = lnm | mnm | rnm;
public int CurrentState => (currentm & triplet) >> rcur;
public int NextState => (nextm & triplet) >> rnext;
public bool Changed => CurrentState != NextState;

All right, let’s get into it. The basic layout of the class is by design almost exactly the same as our previous algorithm because that’s the whole idea of this pedagogy:

class StaffordOne : ILife
  private int height = 258;
  private int width = 88;
  private Triplet[,] triplets;
  private List<(int, int)> changes;

We have a rectangular array of 258×88 triplets. The top and bottom row, and the left and right triplet will be a rectangle of death, so that gives us 256×258 cells to play with.

The change list coordinates are triplet array coordinates, not Life grid coordinates.

I’ve still got my “BecomeAlive” and “BecomeDead” helper methods. They take Life grid coordinates. They are still idempotent but they no longer update the change list because I want to do that on a per-triplet granularity, not a per-cell-changed granularity, and I don’t want to use a hash table to deduplicate the change list.

Rather, they return true or false, changed or unchanged, and let the caller decide how to update the change list.

private bool BecomeAlive(int x, int y)
  int tx = x / 3;
  Triplet t = triplets[tx, y];
  switch (x % 3)
  case 0:
    if (t.LeftCurrent)
      return false;
    triplets[tx - 1, y - 1] = triplets[tx - 1, y - 1].UUP();
    triplets[tx, y - 1] = triplets[tx, y - 1].PPU();
    triplets[tx - 1, y] = triplets[tx - 1, y].UUP();
    triplets[tx, y] = t.SetLeftCurrent(true);
    triplets[tx - 1, y + 1] = triplets[tx - 1, y + 1].UUP();
    triplets[tx, y + 1] = triplets[tx, y + 1].PPU();

As in Abrash’s algorithm, we do the expensive work of updating the living neighbour counts only when something changed. If the left cell in a triplet changed then we need to update the counts of the triplets to the north, south, west, northwest and southwest.

We don’t need to update the count of the middle cell in the current triplet because changing the state of the left state bit is what is going to update that count.

And we do not need to update the counts of the triplets to the east, northeast or southeast because those triplets have no neighbours in common with the left cell of this triplet.

I’m going to omit the middle and right cells; you see how this goes I’m sure. And the “become dead” logic is all the same with the signs inverted so let’s skip that too.

The real business logic is in the step function. As before, we do it in two passes.

In the first pass we look at every triplet on the previously-changed list and the neighbour triplets of all those triplets, which might include duplicates. We make a “current changes” list of every triplet that is going to change, and stash that change in the “next” bits of the triplet.

In the second pass we update the neighbour counts and current state bits of every changed cell. As we do so, we create a deduplicated changes list for the next tick.

public void Step()
  var currentChanges = new List<(int, int)>();
  foreach ((int cx, int cy) in changes)
    int minx = Max(cx - 1, 1);
    int maxx = Min(cx + 2, width - 1);
    int miny = Max(cy - 1, 1);
    int maxy = Min(cy + 2, height - 1);
    for (int y = miny; y < maxy; y += 1)
      for (int x = minx; x < maxx; x += 1)

This is basically the same code as before, but something to note here. Suppose we have a triplet that is on the “changed” list because its left cell, and only its left cell, changed on the previous tick. We are now examining the neighbouring triplets, which includes the northeast, east and southeast neighbouring triplets, but those are not affected by a change in the left cell. We are therefore frequently doing unnecessary work here. Isn’t this bad?

There are ways to avoid that, but as we keep adding optimizations to our implementation of Stafford’s algorithm, we’ll see that it becomes very cheap to process a neighbour that is not going to change. In cases where it is more expensive to run an “avoidable work detector” than it is to simply do the work, it’s better to do the work. I’m therefore going to ignore this small problem for the remainder of the exploration of the algorithm.

Now we get into the meat of the algorithm, as usual:

        Triplet c = triplets[x, y];
        Triplet t = c;
        int lc = t.LeftCount;
        int mc = t.MiddleCount;
        int rc = t.RightCount;
        t = t.SetLeftNext(lc == 3 | t.LeftCurrent & lc == 2);
        t = t.SetMiddleNext(mc == 3 | t.MiddleCurrent & mc == 2);
        t = t.SetRightNext(rc == 3 | t.RightCurrent & rc == 2);
        if (t.Changed)
          currentChanges.Add((x, y));
          triplets[x, y] = t;

That gets us through the first pass. The second pass is rather more verbose than our previous version because it is doing three cells per loop iteration, but is logically the same.

  foreach ((int x, int y) in currentChanges)
    Triplet t = triplets[x, y];
    if (!t.Changed)
    bool changed = false;
    if (t.LeftNext)
      changed |= BecomeAlive(x * 3, y);
      changed |= BecomeDead(x * 3, y);
    if (t.MiddleNext)
      changed |= BecomeAlive(x * 3 + 1, y);
      changed |= BecomeDead(x * 3 + 1, y);
    if (t.RightNext)
      changed |= BecomeAlive(x * 3 + 2, y);
      changed |= BecomeDead(x * 3 + 2, y);
    changes.Add((x, y));

The changed flag I’m tracking there is just a cheap sanity check to make sure we’re not doing unnecessary work, and that BecomeAlive/BecomeDead are doing the right thing.

That’s it; rather more tedious code but the same algorithm as previous. We should expect that there is a performance penalty as we have added a bunch of bit twiddling that we did not have before but have only gained a very small amount of parallelism, like setting three neighbour counts with a single addition. If we run the performance test on the release build we get:

Algorithm           time(ms)  ticks  size(quad)    megacells/s
Naïve (Optimized):   4000       5K      8               82
Abrash                550       5K      8              596
Abrash w/ changes     190       5K      8             1725 
Abrash, O(c)          240       5K      8             1365
Stafford v1           340       5K      8              963

We’ve lost almost half the speed of our previous best version of this algorithm. I sure hope there was a point to all this.

Next time on FAIC: We’ll take another brief foray into some interesting Life forms, and improve the speed controls in the client.

Coming up after that: How can we optimize the heck out of the first pass inner loop given this data structure?

For today’s extra content: the creepiest spaceship, Spider:

Image from the wiki.

3 thoughts on “Life, part 16

  1. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #3018

  2. If I am understanding this correctly, when you count the right-most cell in a triplet you do not include the middle cell. In that case, shouldn’t the top-right entry be D-0 before and D-1 after?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s