Life, part 12

Code for this episode can be found here. Exciting news for the client; I have added a play/pause button. I suppose I could have added that single line of code earlier, but hey, better now than later.

Last time on FAIC I presented a Life algorithm from an early 1990s article in PC Techniques magazine by noted optimization guru Michael Abrash; the basic idea was to observe that we can add redundancy by storing the current neighbour count of every cell in the cell itself. The added cost of having to update that redundant data infrequently more than pays for the cost of having to not recompute it frequently.

The sequel to that article, which is available as chapter 18 of the book I mentioned last time, gives a nigh-impenetrable account of improvements to Abrash’s algorithm made by David Stafford. There are a number of pedagogic problems with the way the algorithm is presented; what I’ve settled on is to do a series of episodes where I gradually mutate Abrash’s algorithm into Stafford’s. Today we’ll start with a “change list” optimization.

I noted at the end of the last episode that the algorithm “knows what changed”, and that should have got your optimization sense tingling. That is information we can use provided that we can keep it around! The key observation to make here is:

If a cell did not change on the previous tick, and also none of its neighbours changed on the previous tick, then the cell will not change on the next tick.

But that is phrased with too many negatives; let’s turn that into a positive statement:

The only cells which might change on the next tick are those that changed on the previous tick, or the neighbour of a cell that changed on a previous tick.

If you are not convinced that first, both statements are equivalent, and second, that they are both correct, make sure you are able to convince yourself before reading on, because this is the key observation that we need to make this optimization.

What we can do then is update our existing implementation of the “store the counts in the cells” algorithm to remember what cells changed on the previous tick. On the next tick, instead of looping over every cell in the grid, we loop over the cells which changed previously, and the neighbors of those cells.

Now before I write the code, I am sure that the attentive readers have immediately noticed a problem. What if two near to each other cells both changed on the previous tick? They might share up to four neighbors, and so it sounds like we might be recomputing those neighbours potentially multiple times. This could have a performance impact, because we are doing unnecessarily duplicated work, and worse, it could have a correctness impact.

What is the correctness impact? Remember that the key to Abrash’s algorithm is that we maintain the invariant that the redundantly stored neighbour count is accurate. Our implementation had the property that every cell was considered once, and so if it changed, it updated the neighbour counts once. What if that is no longer the case? You can imagine a scenario where we say “OK, this cell is becoming alive so increment all the neighbour counts”, and then we do that a second time and now the neighbour counts are disastrously wrong.

There are two ways to deal with this situation:

Somehow detect the “I’ve got two changes that have neighbors in common” situation to deduplicate the “possibly affected neighbours” set. This then involves additional data structures and additional work, all of which is solely in the service of maintaining consistency of a redundant data structure. Redundancy has costs!
Make updates idempotent. A second update becomes a no-op. If your “I’ve done this already” check is fast then the performance impact of “doing the same work twice” becomes minimal.

In modern C# code where I have debugged and fast generic set and dictionary classes available instantly I would be inclined to use the first approach — and indeed, we will make heavy use of such classes in later episodes. But in keeping with the spirit of the original 1990s-era implementations written in simple C or assembler, I’m going to for now stick with the second approach and just use a simple algorithm. As we will see, if we are clever we can get a deduplicated “recent changes” list for free, even if we end up touching some neighbouring cells twice.

In x86 assembler you would not even have a list data type; you’d just grab a block of memory to store cell pointers and keep a “high water mark” to the current top of the list. But to make this somewhat more accessible to the modern C# programmer we’ll make a list of tuples; the algorithm is basically the same regardless of the implementation of the list.

The cell definition will be exactly the same, so I will not repeat that. We keep one extra piece of state, which is “what changed on the last tick?”

        private List<(int, int)> changes;

Oh my goodness how I love finally having tuples in C#. How did we ever live, having to build little structs for these trivial tasks?

The BecomeAlive and BecomeDead helpers are the same except for two small changes. First, they are now idempotent, and second, they record when they got a change:

private void BecomeAlive(int x, int y)
{
  if (cells[x, y].State)
    return; // Already became alive; cheap idempotency!
  changes.Add((x, y));
  // ... and the rest is the same ...

Since the only place we add to the new change list is here, and since these methods are now idempotent, the change list for next time will be deduplicated.

We make some minor changes to the step function to ensure that we are iterating over only valid neighbours of recently-changed points, and we are saving the locations of the newly-changed points:

public void Step()
{
  Cell[,] clone = (Cell[,])cells.Clone();
  var previousChanges = changes;
  changes = new List<(int, int)>();
  foreach((int cx, int cy ) in previousChanges)
  {
    int minx = Max(cx - 1, 1);
    int maxx = Min(cx + 2, width - 1);
    int miny = Max(cy - 1, 1);
    int maxy = Min(cy + 2, height - 1);
    for (int y = miny; y < maxy; y += 1)
    {
      for (int x = minx; x < maxx; x += 1)
      {
        Cell cell = clone[x, y];
        int count = cell.Count;
        if (cell.State)
        {
          if (count != 2 && count != 3)
            BecomeDead(x, y);
        }
        else if (count == 3)
          BecomeAlive(x, y);
      }
    }
  }
}

I’ve also removed the “is this cell and all its neighbours dead?” check. Why? Because now that we are only looking at changed cells and their neighbours, the “dead cell in the middle of eight dead cells” case is now rarely processed in the inner loop. Remember, the whole idea of this optimization was to identify regions of the board that are changing, and isolated dead cells are not changing.

Put another way: to be in that situation in this algorithm, the cell under consideration must have either (1) been a living cell with no living neighbours, which is rare, or (2) been a cell who had living neighbours which all died, which is also rare.

What are the performance characteristics of this algorithm with the change tracking optimization?

First off, it is still O(n) in the number of cells in both time and memory because of that pesky array clone in there. What about the new costs that we’ve added? Maintaining the change list is O(number of changes) in both time and space, and the number of changed cells is always less than the number of cells, so this additional burden cannot be worse than O(n).

What about the raw speed?

Algorithm           time(ms)  ticks  size(quad)    megacells/s
Naïve (Optimized):   4000       5K      8               82
Scholes              3250       5K      8              101  
Frijters SIMD        1000       5M      4             1200
Abrash                550       5K      8              596
Abrash w/ changes     190       5K      8             1725

We have a new winner once again, at almost three times the throughput of our previous version, and beating the (unscalable) SIMD 16×16 implementation in cells per second. This is over 20x faster than our minimally-optimized naive version!

That pesky array allocation sure is irritating, isn’t it? That right now is the only thing keeping us from making this algorithm O(changes) in time instead of O(n). It seems like we could have an algorithm where no matter how big the board was, the time cost of computing a new board would be proportional to the number of cells which changed, and not the number of cells overall.

Next time on FAIC: that seems like an attractive property for an algorithm to have, so let’s make it happen!

4 thoughts on “Life, part 12”

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #3001
Joren on May 27, 2020 at 7:01 am said:

I’m a little curious if it’s a speedup or a regression to sort the list of cells by their coordinates, for cache locality benefits.

And what if we don’t allocate a new list but keep the old one around and add/remove cells we look at? Maybe the list should become a priority queue (with p == y * height + x?) in that attempt.

Reply ↓
- ericlippert on May 27, 2020 at 10:54 am said:
  
  The cache question is a really interesting one that I don’t know the answer to, but it certainly is astonishing the penalty you get if you have frequent cache misses.
  
  As for making a priority queue, and so on, that’s a good idea but for now I’m going to try to emulate the sorts of algorithms that Abrash and Stafford used in their 1990s C / machine language implementations, which did not include fancy-pants data structures. We’ll get into more complex data structures later in this series.
  
  But if you want to try it out and report back I’d be happy to hear the results!
  
  Reply ↓
  - Joren on May 28, 2020 at 11:53 am said:
    
    Well I played with it a bit but wasn’t able to get any performance benefits that outweighed the slowdown from sorting. I wondered if 256 x 256 is just too small to have significant cache misses on my machine, but experimenting with much larger boards makes the timings no longer very reproducible (due to GC noise?) so I wasn’t really able to gather reliable data.
    
    Maybe I’ll try again if we get to a stage where there’s no large allocations happening every tick. 🙂
    
    Reply ↓

Fabulous adventures in coding

Eric Lippert's blog

Life, part 12

4 thoughts on “Life, part 12”

Leave a comment Cancel reply

Share this:

Related

4 thoughts on “Life, part 12”

Leave a comment Cancel reply