Life, part 23

This series is getting much longer than I expected, but I’m having a great time, so let’s keep it going. I want to look at two more algorithms; for the next few episodes we’ll look at the brilliant and subtle QuickLife by Alan Hensel.

This algorithm is a master class in bit twiddling solutions, and rather difficult to understand compared to the algorithms we’ve seen so far. The source code, unfortunately, does not make it easier. A typical fragment from the original 1990s-era Java source code, linked above, is:

int ix01 = (ix00 & 0xf0f0) | (cN.q[14] & 0x0f0f);
int ix11 = (ix10 & 0xf0f0) | (cN.q[15] & 0x0f0f);
int ix03 = (ix02 & 0xf0f0) | (ix01 & 0x0f00) | (cN.q[7] & 0x000f);

What on Earth is this code doing? You can’t read it in isolation and have any clue. What is “q”? What are these masks for? Why are 14, 15 and 7 important? What do the variables represent?

Long-time readers of this blog know that when I write code, I always try to make the code emphasize ideas in the business domain of the program. The Java implementation of QuickLife strongly emphasizes the mechanism domain; it is a program all about twiddling bits. In the code fragment above the only hint of business domain is that cN is a block of cells to the north of something else. The code is also rather hard to understand because there are practices such as “if” statements with 700 lines between the condition and the “else”.

Now, to be fair, the point of the code was not pedagogic clarity, and Java’s lack of properties and user-defined value types makes it more difficult to capture bit twiddling in efficient unboxed types. Fort the purposes of this blog I do want to be pedagogically clear though, so I’m going to implement the same algorithm in C#, but broken down into small parts that are individually clear.

Also, I’m going to try something a little different this time; I’m not going to describe the overall algorithm or data structures up front. Rather, we’ll work our way up from tiny little boards to big ones, adding features and efficiencies as we go. And we’ll also see why throughout this series I’ve been pushing this notion of Life grids that are “quads” — squares where the side is a power of two.

Let’s start as small as we can. We represent a board with a single cell — a 0-quad — as a single bit. I’m not going to make a data structure for that.

A board with four cells — a 1-quad — is a four-bit unsigned integer. We’ll call the four bits “SE”, “SW”, “NE” and “NW” for obvious reasons. That is, bits 0, 1, 2 and 3 of our four-bit integer are logically:

  3 2
  1 0

I’m also not going to make a data structure for a 1-quad.

A 2-quad is four 1-quads, so that’s 16 bits. We can represent that as a 16 bit unsigned short integer, and this time I am going to make a data structure.

The convention we’ll use is slightly different than the convention we used for 1-quads (and frankly I am not sure why that is; I am just following the conventions of the code as it was originally written!) For a 2-quad, the low four bits are the southeast corner, then the next four are northeast, then southwest, then northwest. That is, the bit numbers in a 2-quad are:

    NW         NE
      f e | 7 6
      d c | 5 4
      b a | 3 2
      9 8 | 1 0
    SW         SE 

I am going to make a data structure for this one! First thing is: I need a way to turn a ushort into one of these, and to turn it back into a ushort:

struct Quad2
  private readonly ushort cells;
  public Quad2(ushort cells) => this.cells = cells;
  public static explicit operator ushort(Quad2 x) => x.cells;

And now I am regretting my decision early on in this series to propose the jargon “n-quad” when I could have proposed “quad-n”. “2Quad” is not a legal identifier in C#. Forgive me if from now on I go back and forth between “n-quad” and “quad-n” for the rest of this series.

A default value of “all dead” and a predicate to check for all-deadness are going to come in handy later, so I’ll just add them now.

  public static Quad2 AllDead = new Quad2(0);
  public bool Dead => cells == 0;

We will need to access the component 1-quads, but rather than returning 4-bit integers, it turns out that in this algorithm we can almost always simply get the 4-quad masked out in place; that is, the NW property of a Quad2 should be a Quad2 that is zero everywhere but the northwest.

This smart insight — that the operations on 1-quads can be done “in place” rather than having to mask-and-bit-shift them out — is one of the keys to the performance of this algorithm, so keep that in mind as we go through the next few episodes.

As we will see in the next episode, we will also need to do the same for the 2×4 and 4×2 edges on each side:

  const uint NWMask = 0xf000;
  const uint SWMask = 0x0f00;
  const uint NEMask = 0x00f0;
  const uint SEMask = 0x000f;
  const uint EastEdgeMask = NEMask | SEMask;
  const uint WestEdgeMask = NWMask | SWMask;
  const uint SouthEdgeMask = SWMask | SEMask;
  const uint NorthEdgeMask = NWMask | NEMask;
  public Quad2 NW => new Quad2((ushort)(cells & NWMask));
  public Quad2 NE => new Quad2((ushort)(cells & NEMask));
  public Quad2 SW => new Quad2((ushort)(cells & SWMask));
  public Quad2 SE => new Quad2((ushort)(cells & SEMask));
  public Quad2 NorthEdge => new Quad2((ushort)(cells & NorthEdgeMask));
  public Quad2 SouthEdge => new Quad2((ushort)(cells & SouthEdgeMask));
  public Quad2 EastEdge => new Quad2((ushort)(cells & EastEdgeMask));
  public Quad2 WestEdge => new Quad2((ushort)(cells & WestEdgeMask));

That does it for the component 1-quads, but we will also need to get at individual cells. We should talk about coordinates. Let’s say that for a 2-quad, the convention is that (0, 0) is the far southwest corner and (3, 3) is the far northeast corner, as usual.

Of course I want structs to be immutable. I’m going to split “set” and “clear” up into two operations, one which sets a cell to alive and one which sets it to dead. I’ll omit the error checking that ensures that the coordinates are (0, 0) through (3, 3):

  private static readonly uint[] masks = new uint[]
    1 << 0x9, 1 << 0x8, 1 << 0x1, 1 << 0x0,
    1 << 0xb, 1 << 0xa, 1 << 0x3, 1 << 0x2,
    1 << 0xd, 1 << 0xc, 1 << 0x5, 1 << 0x4,
    1 << 0xf, 1 << 0xe, 1 << 0x7, 1 << 0x6
  public bool Get(int x, int y) => 
    (cells & masks[x + y * 4]) != 0;
  public Quad2 Set(int x, int y) => 
    new Quad2((ushort)(cells | masks[x + y * 4]));
  public Quad2 Clear(int x, int y) => 
    new Quad2((ushort)(cells & ~masks[x + y * 4]));

In the next episode I’m also going to need to be able to “or” two Quad2s together.

  public static Quad2 operator |(Quad2 x, Quad2 y) => 
    new Quad2((ushort)(x.cells | y.cells));

And for debugging purposes, we can dump out a Quad2 in the “cells” file format:

  public override string ToString()
    string s = "";
    for (int y = 3; y >= 0; y -= 1)
      for (int x = 0; x < 4; x += 1)
        s += this.Get(x, y) ? 'O' : '.';
      s += "\n";
    return s;

We will need just a few more operations in the next episode, but we’ll motivate them then.

I know, this looks like a lot of code to write for what is a 16 bit unsigned short integer, but remember that fragment of bit twiddling code from the original Java source?

int ix01 = (ix00 & 0xf0f0) | (cN.q[14] & 0x0f0f);
int ix11 = (ix10 & 0xf0f0) | (cN.q[15] & 0x0f0f);
int ix03 = (ix02 & 0xf0f0) | (ix01 & 0x0f00) | (cN.q[7] & 0x000f);

I already like it a lot better as:

Quad2 ix01 = ix00.NorthEdge | cN.q[14].SouthEdge;
Quad2 ix11 = ix10.NorthEdge | cN.q[15].SouthEdge;
Quad2 ix03 = ix02.NorthEdge | ix01.SW | cN.q[7].SE;

In this new form you can look at the code and, though it is still not clear what “ix01” and “q” and “14” mean, at least we now know what is happening: we are constructing three new 2-quads from the component 1-quads of six other 2-quads.

Writing the code this way does not make it slower! Value types are not boxed in C#, and the jitter is smart about inlining operations like this. (And if it isn’t, then it is a lot easier to speed up code if it is written so that its business semantics are clear.)

Next time on FAIC: What is the step-forwards-one-tick algorithm for a 2-quad? It’s only 16 cells; how hard could it be?




8 thoughts on “Life, part 23

  1. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #3035

  2. I’m curious why you chose 32-bit uint as the data type for the masks that operate on your 16-bit ushort values. That certainly leads to a lot of explicit narrowing casting as well as implicit widening casting. Each of which is a (slight) performance sapping type conversion under the hood and makes for busier more mechanistic source code.

    If the next episode creates a Quad3 presumably it’ll be built (derived?) from Quad2, and perhaps larger QuadN types will follow. So depending on the details (I haven’t peeked at the Java) I could imagine a coming need for wider masks. Sort of. Because uints are fine for a putative Quad3 but aren’t wide enough for a Quad4. That would need ulongs. Quad5s are right out unless you jump to some bigint implementation. That’s real unlikely for performance reasons.

    So I remain confused. Then again, I confuse easily, especially at the early stages of a large reveal.

    • Good question. The short answer is: I didn’t give it much thought. The masks are constants and the compiler will do whatever math it has to at compile time, so I doubt there will be much of a runtime impact. As for making the source code “busy”, well, there are going to be casts. When I use “ushort” I am doing so explicitly to call out “this is an array of bits, not a number”, but C# does not make it easy to work with ushorts without casting; most operations in C# are done — from the compiler’s perspective — in ints, and then need to be cast back to ushort.

      I do dearly wish that C# or other languages would make an efficient “small set of bits” type that was a byte, ushort, uint or ulong behind the scenes, but gave you a high-level interface; it would make code like this much cleaner and less “busy”. I wish a lot of things.

      But the whole point of capturing this “busy-ness” in the implementation is to keep it out of the call sites, which I think you’ll agree I’ve done. So I’m not going to stress about it too much.

      Your musings about Quad3 and Quad4 are good. A Quad2 is two bytes; a Quad3 is four Quad2s, so that would be 8 bytes, which is a long. I could stuff them into a long and then bit-shift them out, but as we’ll see in a couple episodes, that’s not necessary. We’ll just make a struct with four Quad2s in it, and let the compiler take care of figuring out how to efficiently copy the bits out of it when we need them.

      The implementation of Quad4 is going to be *quite surprising* for a number of reasons; it is not the obvious “four Quad3s”, so you have that confusion to look forward to!

  3. Pingback: Life, part 24 | Fabulous adventures in coding

    • Typo, fixed, thanks!

      In a few episodes I’m going to need to get the 2×8 and 8×2 edges of a 3-quad, hence the mistake. Once again I am typing faster than I am thinking. 🙂

  4. > The convention we’ll use is slightly different than the convention we used for 1-quads (and frankly I am not sure why that is; I am just following the conventions of the code as it was originally written!)

    Is it because the mirroring you do in episode 25 is easier?

    • The mirroring/flipping operation has a nice implementation in the 2-quad but that operation does not depend on the bit ordering *within* each of the component 1-quads. The bits in each 1-quad could have been in the same orientation — low bit in the SE, then NE, SW, NW — but for reasons unknown to me, the 1-quad uses a different convention than the 2-quad, swapping NE and SW. I don’t see a good reason to do so.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s