Life, part 13

Source code for this episode is here.

Just as a reminder: I am developing my C# version of Stafford’s “QLIFE” algorithm by taking Abrash’s “keep the neighbour counts around” algorithm and gradually adding optimizations. That’s much easier to understand than trying to explain the whole complicated thing at once.

So far, we’ve got the following algorithm:

Keep up-to-date neighbour counts of all cells in a byte array
Maintain a list of cells which changed on the previous tick
On the current tick, examine only the cells which changed on the previous tick and their neighbours. Update the ones that changed, and create a deduplicated change list for the next tick.

However, we are still using a similar technique as we used in our naïve algorithm to ensure that we never write a cell before we need to read its value: we make a complete copy of all the cells on each tick, read from the copy, and write to the original. Making that copy is fast, but it is O(n) in the total number of cells; it seems plausible that we ought to be able to come up with an algorithm that is O(changed cells).

Here’s what we’re going to do.

Recall that we are spending one byte per cell, but are only using five of the eight bits available in a byte: four for the neighbour count, one for the state. We’re going to use one of the extra bits to store the next state of a cell that is about to change.

Why do we need that? Surely if we already know what cells are about to change, then the next state is just the opposite of the current state, right? But we need it because we are trying to solve two problems at once here: compute the new state, and deduplicate the new change list without using a hash table. Since we are looking at all neighbours of previously-changed cells, we will frequently end up looking at the same cells multiple times, but we do not want the change list to become more and more full of duplicates as time goes on. If the “next state” and the “current state” bits are the same, then we already updated the current state, so we can skip adding it to the new change list, and thereby deduplicate it. (As I noted last time, we’ll get to algorithms that use automatically-deduplicated sets in future episodes.)

As always I want the bit twiddling to be isolated into methods of their own and structs to be immutable. Adding accessors for another bit in my Cell struct is straightforward:

private const int next = 5;
private const int nextm = 1 << next;
public bool Next => (cell & nextm) != 0;
public Cell NextAlive() => new Cell((byte)(cell | nextm));
public Cell NextDead() => new Cell((byte)(cell & ~nextm));

Easy peasy. I’m going to change the Step algorithm but everything else — BecomeAlive and BecomeDead in particular — stays exactly the same. As does most of Step:

public void Step()
{

We’re going to create a new, temporary list of the cells that are about to change on this tick. This list is allowed to have duplicates.

  var currentChanges = new List<(int, int)>();

There is a line of code conspicuous by its absence here. We are not cloning the array!

Once again we loop over the cells which changed last time and their neighbours; this is all the same:

  foreach ((int cx, int cy) in changes)
  {
    int minx = Max(cx - 1, 1);
    int maxx = Min(cx + 2, width - 1);
    int miny = Max(cy - 1, 1);
    int maxy = Min(cy + 2, height - 1);
    for (int y = miny; y < maxy; y += 1)
    {
      for (int x = minx; x < maxx; x += 1)
      {

Once again we have the Life condition that you are now familiar with:

        Cell cell = cells[x, y];
        int count = cell.Count;
        bool state = cell.State;
        bool newState = count == 3 | count == 2 & state;

Is this cell going to change? If so, record the new state in bit 5 and add it to the current changes list:

        if (state & !newState)
        {
          currentChanges.Add((x, y));
          cells[x, y] = cell.NextDead();
        }
        else if (!state & newState)
        {
          currentChanges.Add((x, y));
          cells[x, y] = cell.NextAlive();
        }
      }
    }
  }

Again, yes, I could do the mutation of the bit in-places since we have a collection of variables, but I can’t bring myself to mutate a value type.

We’re done our first pass; the list of previous changes is now useless to us:

  changes.Clear();

The second pass is much simpler than the first. For all the cells that need changing, use idempotent helper functions from last time to record the new neighbour counts and update the change list for the next tick:

  foreach ((int x, int y) in currentChanges)
  {
    if (cells[x, y].Next)
      BecomeAlive(x, y);
    else
      BecomeDead(x, y);
  }
}

And we’re done.

If you look at the asymptotic performance, plainly it is O(changed cells), and not O(total number of cells), which is great! This means that we could have extremely large boards with lots of empty space or still lifes, and we only pay a per-tick time price for the evolving or oscillating cells that change.

Our “static” memory consumption, for the array, is still O(total cells) of course, but our dynamic burden on the garbage collector is also O(changed cells) per tick, which seems like it ought to be a win.

What about our actual performance of computing acorn for 5000 cycles on an 8-quad?

Algorithm           time(ms)  ticks  size(quad)    megacells/s
Naïve (Optimized):   4000       5K      8               82
Scholes              3250       5K      8              101  
Frijters SIMD        1000       5M      4             1200
Abrash                550       5K      8              596
Abrash w/ changes     190       5K      8             1725 
Abrash, O(c)          240       5K      8             1365

I was slightly surprised to discover that it is around 20% slower! As I have pointed out a number of times, copying a 64K array of bytes is astonishingly fast. That’s the thing to always remember about asymptotic performance: it is about how the performance cost changes as the problem becomes large. It would be interesting to do some scaling experiments and discover when the cost of copying the array becomes the dominating cost, but I’m not going to; if anyone wants to do the experiment please do and report back.

Update: Reader jaloopa has done the experiment on their machine; when bumping up from an 8-quad to a 10-quad — so, just 16 times bigger — the O(c) algorithm is ten times faster than the O(n) algorithm! So this is actually a big win once the board sizes get significantly larger than 64KB. I was closer to the break-even point than I thought.

Next time on FAIC: I’ve been talking a lot about Life algorithms but very little about Life itself. Let’s digress for an episode or two and explore some interesting basic patterns.

After that: We are now using six bits per cell to store the neighbour count, current state and next state. If we can get that back down to five bits per cell then we can fit three cells into a two-byte short. That’s slightly more memory efficient, but at a cost of greatly increasing the amount of bit twiddling we need to do. This seems like a performance-killing change; will it eventually pay for itself by enabling further optimizations, or is it a big waste of effort? We’ll find out!

15 thoughts on “Life, part 13”

D.R. on May 28, 2020 at 11:02 am said:

This series is indeed a fabulous adventure, keep it going!

Reply ↓
- ericlippert on May 28, 2020 at 5:16 pm said:
  
  Thanks! I am enjoying it too, and there’s a lot to go still.
  
  Reply ↓
Joel on May 28, 2020 at 1:21 pm said:

I also want to leave a note that I love this series (like most other content on this blog). Thank you!

Reply ↓
- ericlippert on May 28, 2020 at 5:16 pm said:
  
  Thanks, I appreciate it!
  
  Reply ↓
tyler on May 28, 2020 at 2:53 pm said:

Obviously, different algorithms perform differently under different conditions. What do you think about the idea of swapping out the algorithms based on heuristics? Say, you find that “Abrash w/ changes” performs better until you start copying an array over 1M in size? Would you then be able to change over to “Abrash, O(c)” at that point?

Taking this further, what if these algorithms that work in O(changed) start to perform worse than O(total) when greater than 50% (or some other point). That’d be something interesting to look at, I think.

I’m trying to come up with some other heuristics that might be interesting to look at.

Reply ↓
- ericlippert on May 28, 2020 at 5:16 pm said:
  
  Absolutely, there are many real-world systems that use heuristics to make composite algorithms. In a recent post I was discussing string searching algorithms and how the naive string searching algorithm is almost always better performing even though it has a bad asymptotic worst case behaviour. We almost never hit the worst case in real life! But there are some string searching libraries that detect when the naive algorithm is likely to do poorly and switch over to a better performing algorithm based on some heuristic.
  
  We could certainly do the same thing with Life. As we’ll see later on in this series, there are some algorithms that are extremely good at large, long-lived patterns but require a lot of “set up” time, so they perform relatively poorly on small, simple patterns. A heuristic that switched off between implementations depending on how it was used would be pretty smart.
  
  Reply ↓
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #3003
jaloopa on May 29, 2020 at 5:20 am said:

>It would be interesting to do some scaling experiments and discover when the cost of copying the array becomes the dominating cost, but I’m not going to; if anyone wants to do the experiment please do and report back.

Turns out it didn’t take long, at least on my hardware

Release build, running directly from the execuatble.
29/05/2020 01:16:33 PM:ConwaysLife.Scholes:3614
29/05/2020 01:16:34 PM:ConwaysLife.Abrash:556
29/05/2020 01:16:34 PM:ConwaysLife.AbrashChangeList:229
29/05/2020 01:16:34 PM:ConwaysLife.AbrashOneArray:260

Changing from sizes of 256 to 512 (514 for the Abrash ones)
29/05/2020 01:15:21 PM:ConwaysLife.Scholes:19552
29/05/2020 01:15:23 PM:ConwaysLife.Abrash:2359
29/05/2020 01:15:23 PM:ConwaysLife.AbrashChangeList:439
29/05/2020 01:15:24 PM:ConwaysLife.AbrashOneArray:289

I ran a few times, and this was the biggest difference between the change list and one array versions. All others were in the region of 100ms difference

Reply ↓
- jaloopa on May 29, 2020 at 5:29 am said:
  
  Getting bigger the savings get very obvious very quickly. Excluding Scholes because I didn’t want to be waiting ages, here’s the 1026 results
  
  29/05/2020 01:28:37 PM:ConwaysLife.Abrash:10035
  29/05/2020 01:28:40 PM:ConwaysLife.AbrashChangeList:2599
  29/05/2020 01:28:40 PM:ConwaysLife.AbrashOneArray:266
  
  Reply ↓
  - ericlippert on May 29, 2020 at 10:50 am said:
    
    That’s great; thanks for taking the time to take the times. 🙂
    
    Reply ↓
JEgger on June 3, 2020 at 12:21 pm said:

Probably you can come back to 5 bits per cell when you store the next state in `currentChanges`, right? So that would be `var currentChanges = new List();`

Reply ↓
- ericlippert on June 3, 2020 at 2:41 pm said:
  
  That’s an excellent guess and an interesting idea, but not actually where we’re going next. It’s a little weirder, where we’re going.
  
  I may explore ideas related to yours in a later episode if I do one on sparse array approaches.
  
  Reply ↓
- JEgger on June 3, 2020 at 11:40 pm said:
  
  Oh, I forgot that generic arguments aka. HTML tags are swallowed for comments, but obviously you knew what I mean. I’ll try it again anyways:
  var currentChanges = new List();
  Or is it
  var currentChanges = new List< (int, int, bool) >();
  
  Reply ↓
Petri on June 7, 2020 at 12:26 am said:

I don’t know if you are going to touch on this, but Game of Life is also one of these problems that are terribly hard to invert. A recent amusing example: http://hardmath123.github.io/conways-gradient.html

Thank you for this amazing series!

Reply ↓
- ericlippert on June 7, 2020 at 8:12 am said:
  
  Thanks for the link! There has been some interesting work done on application of continuous math techniques to discrete problems like this one; I had not seen this particular application before though.
  
  Reply ↓