Approximate results may vary

Posted on August 21, 2020 by ericlippert

Part 33 of my ongoing series is coming but I did not get all the code written that I wanted to this week, so it will be delayed. In the meanwhile:

Living in Canada as a child, of course I grew up learning the metric system (with some familiarity with the imperial and US systems of course). If you want to know how many milliliters of liquid a cubic box holds, you just compute the volume of the box in cubic centimeters and you’re done, because a milliliter is by definition the same volume as a cubic centimeter.

Despite having lived in Seattle for over 25 years, I still sometimes do not have an intuitive sense of conversions of US units; for a particular project I knew that I needed an amount of liquid equal to about the volume of a two-inch cube, but the bottle was measured in fluid ounces. Fortunately we have this operation in every browser:

Thanks Bing for those important eight digits after the decimal place there, manufactured from the zero digits after the decimal place of the input. For context, that extra 0043 on the end represents a volume equivalent to roughly the size of a dozen specs of microscopic dust.

But the punchline that made me LOL was “for an approximate result, divide the volume by 1.8046875”, because of course when I am quickly approximating the volume of eight cubic inches in fluid ounces, the natural operation that immediately comes to mind is divide by 1.8046875.

I have some questions.

That was Bing; what does Google do with the same query?

OK, that’s an improvement in that the amount of precision is still unnecessary, but not outright absurd. But the other problems are all the same.

Why division? Maybe it is just me, but I find division by uneven quantities significantly harder mental arithmetic than multiplication; assuming we want the over-precision, would it not be better to say “for an approximate result, multiply the volume by 0.5541125″?

Why say “to approximate…” and then give an absurd amount of precision in the conversion factor? Approximation is by definition about deliberately not computing an over-precise value.

Surely the right way to say this is “for an approximate result, multiply the volume by 5/9” or even better, “divide by two“. When I saw “divide by 1.8046875” the first thing I thought after “wow that’s so over-precise” was “1.8 is 18/10 is 9/5, so multiply by 5/9“.

I’m going to get there eventually; software can shorten that journey. And I’m going to remember that 5/9ths of a cubic inch is a fluid ounce much more easily than I remember to divide by 1.8046875.

Once you start to see this pattern of over-precision in conversions, it’s like the FedEx arrow: you can’t stop seeing it. Let’s ask the browser how much does an American robin weigh?

64.8 grams. An underweight robin is not 64.5 grams, and not 65.0 grams but 64.8 grams.

To be fair, it looks like this over-precision was the fault of a human author (and their editor) not thinking clearly about how to communicate the fact at hand, rather than bad software this time; if you’re converting “two and a third ounces” to grams it would be perfectly reasonable to round to 65, or even 60. (That third of an ounce is already suspect; surely “two to three ounces” is just fine.) Most odd though is that the computations are not even correct! 2 and 1/3rd ounces is 66.15 grams, and 3 ounces is 85.05 grams, making it rather mysterious where the extra few hundred milligrams went.

I was wondering how many earthworms a robin would have to eat to make up a discrepancy of 0.2 grams. A largish earthworm has got to weigh on the order of a gram, right?

Wow! (For my metric readers out there: 1.5 pounds is 680.388555 grams according to Bing.)

Again: what the heck, Bing? I did not ask for “world’s largest earthworm” or “unusually large earthworms” or even “Australian earthworms”. You know where I live, Bing. (And Google search does no better.)

For some reason I am reminded of Janelle Shane’s “AI Weirdness” tweets; the ones about animal facts are particularly entertaining. These earthworm facts at least have the benefit of being both interesting and correct, but this is hardly the useful result about normal garden-variety annelids that I wanted.

I am also reminded of my favourite animal fact: the hippopotamus can jump higher than a house. It sounds impressive until you remember that houses can’t jump at all.

Obviously all these issues are silly and unimportant, which is why I chose them for my fun-for-Friday blog. And the fact that I can type “8 cubic inches in oz” into my browser or say “how much does a robin weigh?” into a smart speaker and instantly get the answer is already a user interface triumph; I don’t want to minimize that great work. But there is still work to do! Unwarranted extra precision is certainly not the worst sort of fallacious reasoning we see on the internet, but it is one of the most easily mitigated by human-focused software design. I’d love to see improvements to these search functions that show even more attention to what the human user really needs.

Life, part 32

Posted on August 17, 2020 by ericlippert

All right, after that excessively long introduction let’s get into Gosper’s algorithm, also known as “HashLife” for reasons that will become clear quite soon. Since the early days of this series I’ve mostly glossed over the code that does stuff like changing individual cells in order to load in a pattern, or the code that draws the UI, but Gosper’s algorithm is sufficiently different that I’m going to dig into every part of the implementation.

The first key point to understand about Gosper’s algorithm is that it only has one data structure. One way to describe it is:

A quad is an immutable complete tree where every non-leaf node has exactly four children which we call NE, NW, SE, SW, and every leaf is either alive or dead.

However the way I prefer to describe it is via a recursive definition:

A 0-quad is either alive or dead.
An n-quad for n > 0 is four (n-1) quads.

Let’s write the code!

sealed class Quad
{
  public const int MaxLevel = 60;

Because there will be math to do on the width of the quad and I want to keep doing the math in longs instead of BigIntegers, I’m going to limit us to a 60-quad, max. Just because it is a nice round number less than the 64 bits we have in a long.

Now, I know what you’re thinking. “Eric, if we built a monitor with a reasonable pixel density to display an entire 60-quad it would not fit inside the orbit of Pluto and it would have more mass than the sun.” A 60-quad is big enough for most purposes. I feel good about this choice. However I want to be clear that in principle nothing is stopping us from doing math in a larger type and making arbitrarily large quads.

Here are our two 0-quads:

public static readonly Quad Dead = new Quad();
public static readonly Quad Alive = new Quad();

And for the non-0-quads, the child (n-1) quads:

public Quad NW { get; }
public Quad NE { get; }
public Quad SE { get; }
public Quad SW { get; }

We will need to access the level and width of a quad a lot, so I’m going to include the level in every quad. The width we can calculate on demand.

public int Level { get; }
public long Width => 1L << Level;

We’ll need some constructors. There’s never a need to construct 0-quads because we have both of them already available as statics. For reasons which will become apparent later, we have a public static factory for the rest.

public static Quad Make(Quad nw, Quad ne, Quad se, Quad sw)
{
  Debug.Assert(nw.Level == ne.Level);
  Debug.Assert(ne.Level == se.Level);
  Debug.Assert(se.Level == sw.Level);
  return new Quad(nw, ne, se, sw);
}
private Quad() => Level = 0;
private Quad(Quad nw, Quad ne, Quad se, Quad sw)
{
  NW = nw;
  NE = ne;
  SE = se;
  SW = sw;
  Level = nw.Level + 1;
}

We’re going to need to add a bunch more stuff here, but this is the basics.

Again I know what you’re probably thinking: this is bonkers. In QuickLife, a 3-quad was an eight byte value type. In my astonishingly wasteful implementation so far, a single 0-quad is a reference type, so it is already eight bytes for the reference, and then it contains four more eight byte references and a four byte integer that is never more than 60. How is this ever going to work?

Through the power of persistence, that’s how. As I’ve discussed many times before on this blog, a persistent data structure is an immutable data structure where, because every part of it is immutable, you can safely re-use portions of it as you see fit. You therefore save on space.

Let’s look at an example. How could we represent this 2-quad?

Remember, we only have two 0-quads and we cannot make any more. The naïve way would be to make this tree:

But because every one of these objects is immutable, we could instead make this tree, which has one fewer object allocation:

This is the second key insight to understanding how Gosper’s algorithm works: it uses a relatively enormous data structure for each cell, but it achieves compression through deduplication:

There are only two possible 0-quads, and we always re-use them.
There are 16 possible 1-quads. We could just make 16 objects and re-use them.
There are 65536 possible 2-quads, but the vast majority of them we never see in a given Life grid. The ones we do see, we often see over and over again. We could just make the ones we do see, and re-use them.

The same goes for 3-quads and 4-quads and so on. There are exponentially more possibilities, and we see a smaller and smaller fraction of them in any board. Let’s memoize the Make function!

We’re going to make heavy use of memoization in this algorithm and it will have performance implications later, so I’m going to make a relatively fully-featured memoizer whose behaviour can be analyzed:

sealed class Memoizer<A, R>
{
  private Dictionary<A, R> dict; 
  private Dictionary<A, int> hits;
  private readonly Func<A, R> f;
  [Conditional("MEMOIZER_STATS")]
  private void RecordHit(A a)
  {
    if (hits == null)
      hits = new Dictionary<A, int>();
    if (hits.TryGetValue(a, out int c))
      hits[a] = c + 1;
    else
      hits.Add(a, 1);
  }

  public R MemoizedFunc(A a)
  { 
    RecordHit(a); 
    if (!dict.TryGetValue(a, out R r)) 
    { 
      r = f(a); 
      dict.Add(a, r); 
    } 
    return r; 
  }
  public Memoizer(Func<A, R> f)
  {
    this.dict = new Dictionary<A, R>();
    this.f = f;
  }
  public void Clear(Dictionary<A, R> newDict = null)
  {
    dict = newDict ?? new Dictionary<A, R>();
    hits = null;
  }
  public int Count => dict.Count;
  public string Report() => 
    hits == null ? "" :
    string.Join("\n", from v in hits.Values 
                      group v by v into g 
                      select $"{g.Key},{g.Count()}");
}

The core logic of the memoizer is the same thing I’ve presented many times on this blog over the years: when you get a call, check to see if you’ve memoized the result; if you have not, call the original function and memoize the result, otherwise just return the result.

I’ve added a conditionally-compiled hit counter and a performance report that tells me how many items were hit how many times; that will give us some idea of the load that is being put on the memoizer, and we can then tune it.

Later in this series we’re going to need to “reset” a memoizer and optionally we’ll need to provided “pre-memoized” state, so I’ve added a “Clear” method that optionally takes a new dictionary to use.

The memoizer for the “Make” method can be static, global state so I’m going to make a helper class for the cache. (I did not need to; this could have been a static field of Quad. It was just convenient for me while I was developing the algorithm to put the memoizers in one central location.)

static class CacheManager
{
  public static Memoizer<(Quad, Quad, Quad, Quad), Quad> MakeQuadMemoizer { get; set; }
}

We’ll initialize it when we create our first Quad:

static Quad()
{
  CacheManager.MakeQuadMemoizer = 
    new Memoizer<(Quad, Quad, Quad, Quad), Quad>(UnmemoizedMake);
}

And we’ll redo the Make static factory:

private static Quad UnmemoizedMake((Quad nw, Quad ne, Quad se, Quad sw) args)
{
  Debug.Assert(args.nw.Level == args.ne.Level);
  Debug.Assert(args.ne.Level == args.se.Level);
  Debug.Assert(args.se.Level == args.sw.Level);
  return new Quad(args.nw, args.ne, args.se, args.sw);
}

public static Quad Make(Quad nw, Quad ne, Quad se, Quad sw) =>
  CacheManager.MakeQuadMemoizer.MemoizedFunc((nw, ne, se, sw));

All right, we have memoized construction of arbitrarily large quads. A nice consequence of this fact is that all quads can be compared for equality by reference equality. In particular, we are going to do two things a lot:

Create an arbitrarily large empty quad
Ask “is this arbitrarily large quad an empty quad”?

We’re on a memoization roll here, so let’s keep that going and add a couple more methods to Quad, and another memoizer to the cache manager (not shown; you get how it goes.)

private static Quad UnmemoizedEmpty(int level)
{
  Debug.Assert(level >= 0);
  if (level == 0)
    return Dead;
  var q = Empty(level - 1);
  return Make(q, q, q, q);
}

public static Quad Empty(int level) => 
  CacheManager.EmptyMemoizer.MemoizedFunc(level);

public bool IsEmpty => this == Empty(this.Level);

The unmemoized “construct an empty quad” function can be as inefficient as we like; it will only be called once per level. And now we can quickly tell if a quad is empty or not. Well, relatively quickly; we have to do a dictionary lookup and then a reference comparison.

What then is the size in memory of an empty 60-quad? It’s just 61 objects! The empty 1-quad refers to Dead four times, the empty 2-quad refers to the empty 1-quad four times, and so on.

Suppose we made a 3-quad with a single glider in the center; that’s a tiny handful of objects. If you then wanted to make a 53-quad completely filled with those, that only increases the number of objects by 50. Deduplication is super cheap with this data structure — provided that the duplicates are aligned on boundaries that are a power of two of course.

A major theme of this series is: find the characteristics of your problem that admit to optimization and take advantage of those in pursuit of asymptotic efficiency; don’t worry about the small stuff. Gosper’s algorithm is a clear embodiment of that principle. We’ve got a space-inefficient data structure, we’re doing possibly expensive dictionary lookups all over the show; but plainly we can compress down grids where portions frequently reoccur into a relatively small amount of memory at relatively low cost.

Next time on FAIC: There are lots more asymptotic wins to come, but before we get into those I want to explore some mundane concerns:

If portions of the data structure are reused arbitrarily often then no portion of it can have a specific location. How are we going to find anything by its coordinates?
If the data structure is immutable, how do we set a dead cell to alive, if, say, we’re loading in a pattern?
How does the screen drawing algorithm work? Can we take advantage of the simplicity of this data structure to enable better zooming in the UI?

Socially distant abbreviated summer vacation

Posted on August 14, 2020 by ericlippert

Normally this time of year I would be visiting friends and family in Canada, but obviously that’s impossible right now. Instead we took a long weekend at a rental on Bainbridge Island and strolled around some parks in a socially distant manner instead.

The rental had a balcony at tree level, so at least I got to indulge in my vacation pastime of taking pictures of birds. There were a lot of common feeder birds in the neighbourhood — house finches, house sparrows, a few different species of chickadees, that sort of thing — and I got a few shots in. This little female Rufous hummingbirb posed for glamour shots in a birch tree all day long:

(Click on any picture for a larger image)

She was vexed that a female Anna’s hummingbird kept horning in on her territory, hence her ticked-off expression there.

This super fuzzy juvenile black-capped chickadee also spent days posing:

despite being harassed all day long by the noisy gang of teenager Stellar jays on the rooftop next door.

And of course once again I was completely incapable of taking a non-blurry picture of a belted kingfisher despite many attempts.

Sigh. Better luck next time.

Life, part 31

Posted on August 13, 2020 by ericlippert

Today we will finish off our implementation of Hensel’s QuickLife algorithm, rewritten in C#. Code for this episode is here.

Last time we saw that adding change tracking is an enormous win, but we still have not got an O(changes) solution in time or an O(living) solution in space. What we really want to do is (1) avoid doing any work at all on stable Quad4s; only spend processor time on the active ones, and (2) deallocate all-dead Quad4s, and (3) grow the set of active Quad4s when activity reaches the current borders.

We need more state bits, but fortunately we only need a small number; so small that I’m not even going to bit twiddle them. (The original Java implementation did, but it also used some of those bits for concerns such as optimizing the display logic.) Recall that we are tracking 32 regions of the 8 Quad3s in a Quad4 to determine if they are active, stable or dead. It should come as no surprise that we’re going to do the same thing for a Quad4, and that once again we’re going to maintain state for both the even and odd cycles:

enum Quad4State
{
  Active,
  Stable,
  Dead
}

And then in Quad4:

public Quad4State State { get; set; } 
public Quad4State EvenState { get; set; }
public Quad4State OddState { get; set; }
public bool StayActiveNextStep { get; set; }

The idea here is:

All Quad4s are at all times in exactly one of three buckets: active, stable and dead. Which bucket is represented by the State property.
If a Quad4 is active then it is processed on each tick.
If a Quad4 is stable then we draw it but do not process it on each tick.
If a Quad4 is dead then we neither process nor draw it, and eventually we deallocate it. (We do not want to deallocate too aggressively in case it comes back to life in a few ticks.)

How then do we determine when an active Quad4 becomes stable or dead?

If the “stay active” flag is set, it stays active no matter what.
The odd and even cycles each get a vote on whether an active Quad4 should become stable or dead.
If both vote for dead, it becomes dead.
If one votes for stable and the other votes for stable or dead, it becomes stable. (Exercise: under what circumstances is a Quad4 dead on the odd cycles but stable on the even?)
If one or both vote for active, it stays active.

How do we determine if a stable (or dead but not yet deallocated) Quad4 becomes active? That’s straightforward: if an active Quad4 has an active edge that abuts a stable Quad4, the stable one becomes active. Or, if there is no Quad4 there at all, we allocate a new one and make it active; this is how we achieve our goal of a dynamically growing board.

You might be wondering why we included a “stay active” flag. There were lots of things about this algorithm that I found difficult to understand at first, but this took the longest for me to figure out, oddly enough.

This flag means “there was activity on the edge of a neighbouring Quad4 recently”. There are two things that could go wrong in that situation that we need to prevent. First, we could have an active Quad4 that is about to become stable or dead, but it has activity on a border that should cause it to remain active for processing. Second, we could have a stable (or dead) Quad4 that has just been marked as active because there is activity on a bordering Quad4, and we need to ensure that it stays active even though it wants to go back to being stable (or dead).

The easiest task to perform is to keep each Quad4 in one of three buckets. The original implementation did so by ensuring that every Quad4 was on exactly one of three double-linked lists, and I see no reason to change that. I have, however, encapsulated the linking and unlinking into a helper class:

interface IDoubleLink<T> where T : class
{
  T Prev { get; set; }
  T Next { get; set; }
}

sealed class DoubleLinkList<T> : IEnumerable<T> 
  where T : class, IDoubleLink<T>
{
  public int Count { get; private set; }
  public void Add(T item) { ... }
  public void Remove(T item) { ... }
  public void Clear() { ... }
}

I won’t go through the implementation; it is more or less the standard double-linked list implementation you know from studying for boring technical interviews. The only unusual thing about it is that I’ve ensured that you can remove the current item from a list safely even while the list is being enumerated, because we will need to do that.

All the Quad4-level state manipulation will be done by our main class; we’ll add some lists to it:

sealed class QuickLife : ILife, IReport
{
  private readonly DoubleLinkList<Quad4> active = new DoubleLinkList<Quad4>();
  private readonly DoubleLinkList<Quad4> stable = new DoubleLinkList<Quad4>();
  private readonly DoubleLinkList<Quad4> dead = new DoubleLinkList<Quad4>();
  private Dictionary<(short, short), Quad4> quad4s;
  private int generation;

I’ll skip the initialization code and whatnot.

We already have a method which allocates Quad4s and ensures their north/south, east/west, northwest/southeast references are initialized. We will need a few more helper functions to encapsulate some operations such as: make an existing Quad4 active, and if it does not exist, create it. Or, make an active Quad4 into a stable Quad4. They’re for the most part just updating lists and state flags:

private Quad4 EnsureActive(Quad4 q, int x, int y)
{
  if (q == null)
    return AllocateQuad4(x, y);
  MakeActive(q);
  return q;
}
private void MakeActive(Quad4 q)
{
  q.StayActiveNextStep = true;
  if (q.State == Active) 
    return;
  else if (q.State == Dead)
    dead.Remove(q);
  else
    stable.Remove(q);
  active.Add(q);
  q.State = Active;
}
private void MakeDead(Quad4 q)
{
  Debug.Assert(q.State == Active);
  active.Remove(q);
  dead.Add(q);
  q.State = Dead;
}
private void MakeStable(Quad4 q)
{
  Debug.Assert(q.State == Active);
  active.Remove(q);
  stable.Add(q);
  q.State = Stable;
}

Nothing surprising there, except that as I mentioned before, when you force an already-active Quad4 to be active, it sets the “stay active for at least one more tick” flag.

There is one more bit of list management mechanism we should consider before getting into the business logic: when and how do dead Quad4s get removed from the dead list and deallocated? The how is straightforward: run down the dead list, orphan all of the Quad4s by de-linking them from every living object, and the garbage collector will eventually get to them:

private void RemoveDead()
{
  foreach(Quad4 q in dead)
  {
    if (q.S != null) 
      q.S.N = null;
    ... similarly  for N, E, W, SE, NW
    quad4s.Remove((q.X, q.Y));
  }
  dead.Clear();
}

This orphans the entire dead list; the original implementation had a more sophisticated implementation where it would keep around the most recently dead on the assumption that they were the ones most likely to come back.

We have no reason to believe that this algorithm’s performance is going to be gated on spending a lot of time deleting dead nodes, so we’ll make an extremely simple policy; every 128 ticks we’ll check to see if there are more than 100 Quad4s on the dead list.

private bool ShouldRemoveDead => 
  (generation & 0x7f) == 0 && dead.Count > 100;
public void Step()
{
  if (ShouldRemoveDead) 
    RemoveDead();
  if (IsOdd)
    StepOdd();
  else
    StepEven();
  generation += 1;
}

All right, that’s the mechanism stuff. By occasionally pruning away all-dead Quad4s we attain O(living) space usage. But how do we actually make this thing both fast and able to dynamically add new Quad4s on demand?

In keeping with the pattern of practice so far, we’ll write all the code twice, once for the even cycle and once, slightly different, for the odd cycle. I’ll only show the even cycle here.

Remember our original O(cells) prototype stepping algorithm:

private void StepEven()
{
  foreach (Quad4 q in quad4s)
    q.StepEven();
}

We want to (1) make this into an O(changes) algorithm, (2) detect stable or dead Quad4s currently on the active list, and (3) expand the board by adding new Quad4s when the active region reaches the current edge. We also have (4) a small piece of bookkeeping from earlier to deal with:

private void StepEven()
{
  foreach (Quad4 q in active)      // (1) O(changes)
  {
    if (!RemoveStableEvenQuad4(q)) // (2) remove and skip stable/dead
    {
      q.StepEven();
      MakeOddNeighborsActive(q);   // (3) expand
    }
    q.StayActiveNextStep = false;  // (4) clear flag
  }
}

The first one is straightforward; we now only loop over the not-stable, not-dead Quad4s, so this is O(changes), and moreover, remember that we consider a Quad4 to be stable if both its even and odd generations are stable, so we are effectively skipping all computations of Quad4s that contain only still Lifes and period-two oscillators, which is a huge win.

The fourth one is also straightforward: if the “stay active for at least one step” flag was on, well, we made it through one step, so it can go off. If it was already off, it still is.

The interesting work comes in removing stable Quad4s, and expanding the board on the active edge. To do the former, we will need some more helpers that answer questions about the Quad4 and its neighbors; this is a method of Quad4:

public bool EvenQuad4OrNeighborsActive =>
  EvenQuad4Active ||
  (S != null && S.EvenNorthEdgeActive) ||
  (E != null && E.EvenWestEdgeActive) ||
  (SE != null && SE.EvenNorthwestCornerActive);

This is the Quad4 analogue of our earlier method that tells you if any of a 10×10 region is active; this one tells you if an 18×18 region is active, and for the same reason: because that region entirely surrounds the new shifted-to-the-southeast Quad4 we’re computing. We also have the analogous methods to determine if that region is all stable or all dead; I’ll omit showing them.

Let’s look at our “remove a stable/dead Quad4 from the active list” method. There is some slightly subtle stuff going on here. First, if the Quad4 definitely can be skipped for this generation because it is stable or dead, we return true. However, that does not guarantee that the Quad4 was removed from the active list! Second, remember that our strategy is to have both the even and odd cycles “vote” on what should happen:

private bool RemoveStableEvenQuad4(Quad4 q)
{
  if (q.EvenQuad4OrNeighborsActive)
  {
    q.EvenState = Active;
    q.OddState = Active;
    return false;
  }

If anything in the 18×18 region containing this Quad4 or its relevant neighbouring Quad4s are active, we need to stay active. We set the votes of both even and odd cycle back to active if they were different.

If nothing is active then it must be stable or dead. Is this 18×18 region dead? (Remember, we only mark it as dead if it is both dead and stable.)

  if (q.EvenQuad4AndNeighborsAreDead)
  { 
    q.EvenState = Dead;
    q.SetOddQuad4AllRegionsDead();
    if (!q.StayActiveNextStep && q.OddState == Dead)
      MakeDead(q);
  }

Let’s go through each line here in the consequence of the if:

Everything in an 18×18 region is dead and stable. The even cycle votes for death.
We know that an 18×18 region is all dead and stable; that region entirely surrounds the odd Quad4. We therefore know that the odd Quad4 will be all dead on the next tick if it is not already, so we get aggressive and set all its “region dead” bits now.
If we’re forcing this Quad4 to stay active then it stays active; however, even is still voting for death! We’ll get another chance to kill this Quad4 on the next tick.
By that same logic, if the odd cycle voted for death on the previous tick but the Quad4 stayed active for some reason then the odd cycle is still voting for death now. If that’s true then both cycles are voting for death and the Quad4 gets put on the dead list.

We then have nearly identical logic for stability; the only difference is that if one cycle votes for stability, it suffices for the other cycle to vote for stability or death:

  else
  {
    q.EvenState = Stable;
    q.SetOddQuad4AllRegionsStable();
    if (!q.StayActiveNextStep && q.OddState != Active)
      MakeStable(q);
  }
  return true;
}

And finally: how do we expand into new space? That is super easy, barely an inconvenience. If we have just processed an active Quad4 on the even cycle then we’ve created a new odd Quad4 and in doing so we’ve set the activity bits for the 16 regions in the odd Quad4. If the south 16×2 edge of the odd Quad4 is active then the Quad4 to the south must be activated, and so on:

private void MakeOddNeighborsActive(Quad4 q)
{
  if (q.OddSouthEdgeActive)
    EnsureActive(q.S, q.X, q.Y - 1);
  if (q.OddEastEdgeActive)
    EnsureActive(q.E, q.X + 1, q.Y);
  if (c.OddSoutheastCornerActive)
    EnsureActive(q.SE, q.X + 1, q.Y - 1); 
}

Once again we are taking advantage of the fact that the even and odd generations are offset by one cell; we only have to expand the board on two sides of each Quad4 during each tick, instead of all four. When we’re completing an even cycle we check for expansion to the south and east for the upcoming odd cycle; when we’re completing an odd cycle we check for expansion to the north and west for the upcoming even cycle.

It’s a little hard to wrap your head around, but it all works. This “staggered” property looked like it was going to be a pain when we first encountered it, but it is surprisingly useful; that insight is one of the really smart things about this algorithm.

There is a small amount of additional state management code I’ve got to put here and there but we’ve hit all the high points; see the source code for the exact changes if you are curious.

And that is that! Let’s take it for a spin and run our usual 5000 generations of “acorn”.

Algorithm           time(ms) size  Mcells/s bits/cell O-time
Naïve (Optimized):   4000     8      82     8/cell     cells
Abrash (Original)     550     8     596     8/cell     cells
Stafford              180     8    1820     5/cell     change
Sparse array         4000    64      ?   >128/living   change
Proto-QuickLife 1     770     8     426     4/cell     cells
Proto-QuickLife 2     160     8    2050     4/cell     cells
QuickLife              65    20      ?      5/living*  change

WOW!

We have an O(changes) in time and O(living) in space algorithm that maintains a sparse array of Quad4s that gives us a 20-quad to play with, AND it is almost three times faster than Stafford’s algorithm on our benchmark!

Our memory usage is still pretty efficient; we are spending zero memory on “all dead” Quad4s with all dead neighbours. We’ve added more state to each Quad4, so now for active and stable Quad4s we’re spending around five bits per cell; same as Stafford’s algorithm. (Though as I noted in my follow-up, he did find ways to get “maintain a count of living neighbours” algorithms down to far fewer bits per cell.) I added an asterisk to the table above because of course an active or stable Quad4 will contain only 50% or fewer living cells, so this isn’t quite right, but you get the idea.

Again, it is difficult to know how to characterize “cells per second” for sparse array approaches where we have an enormous board that is mostly empty space that costs zero to process, so I’ve omitted that metric.

If you ran the code on your own machine you probably noticed that I added counters to the user interface to give live updates of the size of the active, stable and dead lists. Here’s a graph of the first 6700 generations of acorn (click on the image for a larger version.)

You can really see how the pattern evolves from this peek into the workings of the algorithm!

We start with a small number of active Quad4s; soon small regions of stability start to appear as the pattern spreads out and leaves still Lifes and period-two oscillators in its wake.
The number of dead Quad4s remains very small right until the first glider escapes; from that moment on we have one or more gliders shooting off to infinity. In previous implementations they hit the wall of death, but now they are creating new active Quad4s in their bow shocks, and leaving dead Quad4s in their wakes.
The stable count steadily grows as the active region is a smaller and smaller percentage of the total. Around the 5000th generation everything except the escaped gliders is stable, and we end up with around 100 stable Quad4s and 20 active Quad4s for the gliders.
The action of our trivial little “garbage collector” is apparent here; we throw away the trash only when there are at least 100 dead Quad4s in the list and we are on a generation divisible by 128, so it is unsurprising that we have a sawtooth that throws away a little over 100 dead Quad4s every 128 cycles.
The blue line is proportional to time taken per cycle, because we only process active Quad4s.
The sum of all three lines is proportional to total memory used.

That finishes off our deep dive into Alan Hensel’s QuickLife algorithm. I was quite intimidated by this algorithm when I first read the source code, but once you name every operation and reorganize the code into functional areas it all becomes quite clear. I’m glad I dug into it and I learned a lot.

Coming up on FAIC:

We’ve seen a lot of varied ways to solve the problem of simulating Life, and there are a pile more in my file folder of old articles from computer magazines that we’re not going to get to in this series. Having done this exploration into many of them and skimmed a lot more, two things come to mind.

First, so far they all feel kind of the same to me. There is some sort of regular array of data, and though I might layer object-oriented features on top of it to make the logic easier to follow or better encapsulated, fundamentally we’re doing procedural work here. We can be more or less smart about what work we can avoid or precompute, and thereby eliminate or move around the costs, but the ideas are more or less the same.

Second, every optimization we’ve done increases the amount of redundancy, mutability, bit-twiddliness, and general difficulty of understanding the algorithm.

Gosper’s algorithm stands in marked contrast.

There is no “array of cells” at all
The attack on the problem is purely functional programming; there is very little state mutation.
There is no redundancy. In the Abrash, Stafford and Hensel algorithms we had ever-increasing amounts of redundant state that had to be carefully kept in sync with the board state. In Gosper’s algorithm, there is board state and nothing else.
No attempt whatsoever is made to make individual cells smaller in memory, but it can represent grids a trillion cells on a side with a reasonable amount of memory.
It can compute trillions of ticks per second on quadrillion-cell grids on machines that only do billions of operations per second.
Though there are plenty of tricky details to consider in the actual implementation, the core algorithm is utterly simple and elegant. The gear that does the optimization of the algorithm uses off-the-shelf parts and completely standard, familiar functional programming techniques. There is no mucking around with fiddly region change tracking, or change lists, or bit sets, or any of those mechanisms.
And extra bonus, the algorithm makes “zoom in or out arbitrarily far” in the UI super easy, which is nice for large boards.

This all sounds impossible. It is not an exaggeration to say that learning about this algorithm changed the way I think about the power of functional programming and data abstraction, and I’ve been meaning to write a blog about it for literally over a decade.

It will take us several episodes to get through it:

Next time on FAIC we’ll look at the core data structure. We’ll see how we can compress large boards down to a small amount of space.
Then we’ll closely examine the “update the UI” algorithm and see how we can get a nice new feature.
After that we’ll build the “step forward one generation algorithm” and do some experiments with it.
Finally, we’ll discover the key insight that makes Gosper’s algorithm work: you can enormously compress time if you make an extremely modest addition of extra space.
We will finish off this series with an answer to a question I’ve been posing for some time now: are there patterns that grow quadratically?

Life, part 30

Posted on August 5, 2020 by ericlippert

Holy goodness, we are on part 30; I never expected this to go for so long and we have not even gotten to Gosper’s algorithm yet! We will finish up Hensel’s QuickLife algorithm soon I hope. Code for this episode is here. (UPDATE: There is a small defect in my implementation below which I did not discover for some time; for details see episode 36.)

And holy goodness again: it is August. In a normal year I’d be taking the month off from blogging and traveling to see my family, but thanks to criminally stupid coronavirus response, here I am, working from home. I suppose it could be a lot worse; I am glad to still have my health.

When last we left off we had computed whether each of 32 “regions” of the eight Quad3s in a Quad4 were active (they had changed at least once cell since the previous tick of the same parity), stable (no change), or dead (stable and also all dead cells).

How can we make use of this to save on time?

Let’s suppose we are currently in an even generation, call it K, and we are about to step the northwest Quad3 to get the new Quad3 and state bits for odd generation K+1. Under what circumstances could we skip doing that work? Let’s draw a diagram:

The green square is oddNW, which is what we are about to compute. If any of the light blue shaded “even” regions are marked as active then the green “odd” Quad3 in generation K+1 might be different from how it was in generation K-1. The only way to find out is to execute the step and compare.

But now suppose all the light blue shaded regions are marked as either dead or stable. Remember what this means: in generation K-1 we compared the even Quad3s that we had just computed for generation K to those we had in hand from generation K-2. If all of those cells did not change from generation K-2 to generation K on the even cycle, then generation K+1 will be the same as generation K-1 on the odd cycle, so we can skip doing all work! (Note: the “all work” is a small lie. Do you see why? We’ll come back to this point in a moment.)

What is that light blue shaded region? It is the entirety of evenNW plus the north 8×2 edge of evenSW, the west 2×8 edge of evenNE, and the northwest corner of evenSE. We have a uint that tells us with a single bit operation whether any of those regions are active, but you know what I’m like; I’m going to put it in a descriptive helper:

 private bool EvenNorthwestOrBorderingActive => 
  (evenstate & 0x08020401) != 0x08020401;

And then a method that describes the semantics with respect to the odd generation:

private bool OddNWPossiblyActive() => 
  EvenNorthwestOrBorderingActive;

And only then am I going to add it to our step function:

private void StepEvenNW()
{
  if (OddNWPossiblyActive())
  {
    Quad3 newOddNW = Step9Quad2ToQuad3Even(...);
    OddNWState = oddNW.UpdateOddQuad3State(newOddNW, OddNWState);
    oddNW = newOddNW;
  }

And presto, we just did a small number of bit operations to determine when we can avoid doing a much larger number of bit operations! That’s a win.

I said before that I lied when I said we could avoid all work; we still have some work to do here. (Though in the next episode, we’ll describe how we really, truly can avoid this work!) The problem is: the odd NW quad3 probably still has regions marked as active, and that will then cause unnecessary work to get done on the next tick. If the condition of the if statement is not met then we know that this Quad3 is either stable or dead without having to compute the next generation and compare but we still have to set the Quad3 state bits as though we had.

  else
  {
    OddNWState = oddNW.MakeOddStableOrDead(OddNWState);
  }
}

We do not have that method yet, but fortunately it is not difficult; we need to do only the work to distinguish dead from stable. We add a method to Quad3:

public Quad3State MakeOddStableOrDead(Quad3State s)
{
  s = s.SetAllRegionsStable();
  if (SoutheastCornerDead)
    s = s.SetCornerDead();
  if (SouthEdgeDead)
    s = s.SetHorizontalEdgeDead();
  if (EastEdgeDead)
    s = s.SetVerticalEdgeDead();
  if (AllDead)
    s = s.SetAllRegionsDead();
  return s;
}

Super. All that remains is to work out for each of the remaining seven step functions, what regions do we need to check for activity, then make an efficient bit operation that returns the result. For example, suppose we wish to know if the odd SW Quad3 could possibly be active this round:

private bool OddSWPossiblyActive() =>
  EvenSouthwestOrBorderingActive ||
  S != null && S.EvenNorthEdge10WestActive;

That is: if the evenSW Quad3 is active, or the 2×8 eastern edge of the evenSE is active, or the 10×2 western side of the north edge of the Quad4 to the south is active. Of course those are:

private bool EvenSouthwestOrBorderingActive => 
  (evenstate & 0x00080004) != 0x00080004;
private bool EvenNorthEdge10WestActive => 
  (evenstate & 0x02000100) != 0x02000100;

I know I keep saying this, but it is just so much more pleasant to read the code when it is written in English and the bit operations are encapsulated behind helpers with meaningful names.

Anyways, we have eight helper methods that determine whether a 10×10 region is active, and if not, then we skip stepping and instead mark the regions as stable or dead; I won’t write them all out. Let’s take it for a spin:

Algorithm           time(ms)  ticks  size(quad)    megacells/s
Naïve (Optimized):   4000       5K      8               82
Abrash (Original)     550       5K      8              596
Stafford              180       5K      8             1820
Proto-QuickLife 1     770       5K      8              426
Proto-QuickLife 2     160       5K      8             2050

HOLY GOODNESS; NEW RECORD!

We are now 25 times faster than our original naïve implementation.

But wait, there’s more! We still have not quite implemented the QuickLife algorithm. There are still three problems left to solve:

We do not yet have an O(changes) solution in time. On each tick we examine each of 256 Quad4s; we do very little work for the stable ones, we do a little bit of work for the active ones, but we are still doing work for every Quad4. Can we do no work at all for the stable or dead Quad4s? That would give us a speed win.
We do not yet have an O(living) solution in space. An all-dead Quad4 takes up just as much space as any other. Can we deallocate all-dead Quad4s? That would give us a space win.
We still are not taking advantage of the sparse array of Quad4s to dynamically grow the board into the 20-quad we logically have available to us; we’re trapped in a tiny little 8-quad still. Can we dynamically grow the board to support large patterns?

Next time on FAIC: Yes we can do all those things. But you know what we need? More bits to twiddle, that’s what we need.

Life, part 29

Posted on August 3, 2020 by ericlippert

We’ve built the foundation of the QuickLife algorithm; now let’s make it go fast. This is going to be a longer episode than usual because we have a large volume of code to get through to perform a relatively straightforward task, and we won’t get through all of it today. Code for this episode is here.

Abrash and Stafford’s algorithms both achieved significant wins by maintaining neighbour counts and updating them when something changes. Hensel’s QuickLife algorithm does not count neighbours at all! The neighbour counting has been entirely moved to the construction of the lookup table. So there cannot be any savings there.

Stafford’s algorithm achieved a significant win by tracking what changes from one tick to the next, but there is a flaw in that plan — or, maybe it would be better to say, there is an opportunity for further optimization.

Suppose we have the simplest possible always-changing grid: a single blinker.

Four cells change on every tick: two are born and two die. Stafford’s algorithm works on triples; for every tick we need to consider on the order of a dozen triples that might be changing next.

Now consider the gear we have built so far for QuickLife: we have the state of one even and one odd generation saved. Who cares whether there is a difference between the even generation and the odd generation? In the example above every even generation is the same as every other, and every odd generation is the same as every other.

Period-two oscillators are extremely common in Life, but from QuickLife’s perspective, given that it saves two generations in every Quad4, period-two oscillators could be treated as still Lifes! And remember that of course that still Lifes are also period-two oscillators; they just have the same state in both cycles.

In short: If we know that for some region of a quad4 the next even generation is the same as the previous even generation, then we can skip all computations on that region, and similarly for the odd generations. That works right up until the point where some “active” cells in a neighbouring region get close enough to the stable region to destabilize it.

It seems like there is a powerful optimization available here; how are we going to achieve it?

Let’s start by crisping up the state that we are tracking. We will be considering rectangular “regions” of a Quad3, and classifying them into three buckets:

Active — a cell changed value from the previous generation to the next generation. That is, if we are currently in generation 3, stepping to even generation 4, then we check to see whether the region in question is the same as it was in generation 2.
Stable — a region which is not active; every cell in the region is going to be the same in the next generation as it was in the previous generation.
Dead — a region which is stable, and moreover, every cell in the region is dead

Notice that there is a subtlety here: a region can be entirely dead cells but is still considered active if it just recently became all dead. A region does not become classified as dead until it is both stable and all dead cells.

What regions do we care about? For odd generation Quad3s we will classify the following regions as dead, stable or active:

The entire Quad3

That is, did anything at all in these 64 cells change? and if not, are any of them alive?

The 2×8 eastern edge:

The 8×2 southern edge:

And finally, the 2×2 southeastern corner:

For the even generation Quad3s, we will track the same stuff, except different. For evens, we track the state of:

The entire Quad3
The 2×8 western edge
The 8×2 northern edge
The 2×2 northwest corner

That sounds like a lot of information to deal with: we’ve got three possible states for each of four regions of each of eight Quad3s in a Quad4. Of course, we’re going to deal with it by twiddling bits, and of course I am going to encapsulate every bit operation into a helper method with a sensible name.

Our bit format will be as follows.

Each of the eight Quad3s in a Quad4 will have eight bits of state data.
Those 64 bits will be stored in two uints, one for the evens, one for the odds, and they will be fields of Quad4.
Bits 0-7 will be for the SE Quad3, bits 8-15 the NE, 16-23 the SW and 24-31 the NW

Pretty straightforward so far. What does each of the eight bits in the byte associated with a Quad3 mean? We consider the bits in pairs.

bits 0 and 4 give the state of the SE (odd) or NW (even) corner
bits 1 and 5 give the state of the south/north edge
bits 2 and 6 give the state of the east/west edge
bits 3 and 7 give the state of the Quad3 as a whole
If both bits are off, then the region is active
If both bits are on, then the region is dead
If the lower bit is on and the higher is off, then the region is stable
The higher bit is never turned on if the lower bit is off.

This scheme has some nice properties:

If you want to check for “not active” you just have to look at the low bit
If you want to check for “dead” you just have to look at the high bit
If you want to set a region to “stable” and it is already marked “dead”, that’s OK, it stays dead
And so on

But because I am only going to access these bits via helper functions, it really doesn’t matter what the format is so long as we get all the transitions correct.

I’m going to make a helper struct for the setters, solely to make the code easier to read. (Why just the setters? We will see in the next episode. Foreshadowing!) We will never store one of these, so there is no reason to make it wrap a byte instead of a uint; no reason to make the runtime truncate it on each operation.

struct Quad3State
{
  readonly private uint b;
  public Quad3State(int b) => this.b = (uint)b;
  public Quad3State(uint b) => this.b = b;
  public Quad3State SetAllRegionsActive() => 
    new Quad3State(0x00);
  public Quad3State SetVerticalEdgeAndQuadActive() => 
    new Quad3State(b & 0x33);
  public Quad3State SetHorizontalEdgeAndQuadActive() => 
    new Quad3State(b & 0x55);
  public Quad3State SetQuad3Active() => 
    new Quad3State(b & 0x77);
  public Quad3State SetAllRegionsStable() => 
    new Quad3State(b | 0xf);
  public Quad3State SetVerticalEdgeStable() => 
    new Quad3State(b | 0x04);
  public Quad3State SetHorizontalEdgeStable() => 
    new Quad3State(b | 0x02);
  public Quad3State SetCornerStable() => 
    new Quad3State(b | 0x01);
  public Quad3State SetAllRegionsDead() => 
    new Quad3State(0xff);
  public Quad3State SetVerticalEdgeDead() => 
    new Quad3State(b | 0x44);
  public Quad3State SetHorizontalEdgeDead() => 
    new Quad3State(b | 0x22);
  public Quad3State SetCornerDead() => 
    new Quad3State(b | 0x11);
  public static explicit operator uint(Quad3State s) => s.b;
}

A few things to notice here:

Calling a “set stable” helper keeps the “dead” bit set if it is already set, which is what we want. If we’ve deduced that a region is stable and we already know it is dead, that’s fine.
When we set an edge to active we set the whole Quad3 to active also, since plainly if there is activity in the edge, there is activity in the Quad3.
Contradicting the previous point, when we set an edge to stable or dead, we do not set the corner contained in that edge region to stable or dead, even though obviously it is; we cannot have the entire edge be stable and its corner be unstable. It just so happens that in the original implementation, every code path on which the edge is set to stable, the corner has already been set to stable, and I will maintain this property in my port. It wouldn’t hurt to fix that here, but I wanted to match the behaviour of the original code as much as possible.
There is no “set corner active” because if you’re setting the corner active, you’re setting all the regions active.

All right, we have our helper to set state for a Quad3. We need eight state bytes, one for each of the eight Quad3s in a Quad4. We declare our state words as fields in Quad4:

private uint evenstate;
private uint oddstate;

And make eight helpers to shift the state bits in and out:

private Quad3State EvenSEState
{
  get => new Quad3State(evenstate & 0x000000ff);
  set => evenstate = (evenstate & 0xffffff00) | (uint)value;
}
...

Suppose we’re on an odd generation and we’ve just computed the next (even) generation for the NW Quad3. We have the previous even generation in hand; how are we going to know which setters to call? We need to figure out two things:

Given an old Quad3 and a new one, which regions of the new one are different?
Of the regions that are not different, which of them are all dead?

I mentioned in a previous episode that I was one day going to add more code to the “step a Quad3” methods in Quad4, and that day has come. Remember that the original code was:

private void StepOddNW()
{
  evenNW = Step9Quad2ToQuad3Odd( ... );
}

The new method is:

private void StepOddNW()
{
  Quad3 newEvenNW = Step9Quad2ToQuad3Odd( ... );
  EvenNWState = evenNW.UpdateEvenQuad3State(newEvenNW, EvenNWState);
  evenNW = newEvenNW;
}

That is: compute the next even generation, update the state bits by comparing the previous even generation to the next even generation, and then set the new state bits and new even NW Quad3.

How does the UpdateEvenQuad3State work? Plainly we have to compare the previous generation’s Quad3 to the newly computed Quad3, and in particular we will need to know if any of its regions are dead.

Let’s start with solving the even simpler problem: which regions are composed of all dead cells? I’m going to add these helper functions to Quad3:

bool AllDead => (NW | NE | SW | SE).Dead;
bool NorthwestCornerDead => NW.NW.Dead;
bool SoutheastCornerDead => SE.SE.Dead;
bool NorthEdgeDead => (NW | NE).NorthEdge.Dead;
bool WestEdgeDead => (SW | NW).WestEdge.Dead;
bool SouthEdgeDead => (SW | SE).SouthEdge.Dead;
bool EastEdgeDead => (NE | SE).EastEdge.Dead;

That is, OR together the relevant portions of each component Quad2, mask out the irrelevant portions, and check whatever is left for deadness.

Notice that I’ve made these private; we’ll put all the code that needs these in Quad3. (I told you these helper types were not going to stay simple!)

How will we compute what parts are stable? If we have a previous Quad3 in hand and a new Quad3, we need to know what regions are unchanged. The XOR operation gives us that on bits; XOR is one if the bits are different and zero if they are the same. I’ll add an XOR operation to Quad2, just like we have an OR operation already:

public static Quad2 operator ^(Quad2 x, Quad2 y) => 
  new Quad2((ushort)(x.cells ^ y.cells));

Remember that my goal for this series is to make the code as clear as possible by making it read like “business domain” code rather than “mechanism domain” code. I need to know what regions of a Quad3 have changed, so I’m going to do that by making a new struct called “change report” that gives a pleasant, readable interface overtop of the bit-twidding mechanisms. In Quad3:

private Quad3ChangeReport Compare(Quad3 q) => 
  new Quad3ChangeReport(this, q);

The implementation is basically just a rename of the operations we just added to Quad3!

private struct Quad3ChangeReport
{
  public Quad3ChangeReport(Quad3 x, Quad3 y)
  {
    q3 = new Quad3(x.NW ^ y.NW, x.NE ^ y.NE, x.SW ^ y.SW, x.SE ^ y.SE);
  }
  private readonly Quad3 q3;
  public bool NoChange => q3.AllDead;
  public bool NorthwestCornerNoChange => q3.NorthwestCornerDead;
  public bool SoutheastCornerNoChange => q3.SoutheastCornerDead;
  public bool NorthEdgeNoChange => q3.NorthEdgeDead;
  public bool WestEdgeNoChange => q3.WestEdgeDead;
  public bool SouthEdgeNoChange => q3.SouthEdgeDead;
  public bool EastEdgeNoChange => q3.EastEdgeDead;
}

Adding a few extra lines of code in order to make the call sites readable is very worthwhile in my opinion. Remember, the jitter is pretty smart and will inline these operations, and if it doesn’t, well, it is easier to make a readable program more performant than an unreadable one.

Note also that this struct is nested inside Quad3, so that it can use the private helper methods I just added. I don’t often program with nested types, but this is an ideal use case; it is an implementation detail of a method of the outer type.

We’ve digressed slightly from the question at hand: how do we update state bits given old state bits, an old Quad3 and a new Quad3?

I know the following method looks a little bit hard to read, but imagine how it looks with every one of those bit-twiddling operations written out as a bit operation with a hexadecimal mask! Believe me, it is more understandable when you can read the business logic in English:

public Quad3State UpdateEvenQuad3State(Quad3 newQ3, Quad3State s)
{
  Quad3ChangeReport changes = newQ3.Compare(this);
  if (changes.NorthwestCornerNoChange)
  {
    if (newQ3.NorthwestCornerDead)
      s = s.SetCornerDead();
    else
      s = s.SetCornerStable();
    if (changes.NorthEdgeNoChange)
    {
      if (newQ3.NorthEdgeDead)
        s = s.SetHorizontalEdgeDead();
      else
        s = s.SetHorizontalEdgeStable();
    }
    else
    {
      s = s.SetHorizontalEdgeAndQuadActive();
    }
    if (changes.WestEdgeNoChange)
    {
      if (newQ3.WestEdgeDead)
        s = s.SetVerticalEdgeDead();
      else
        s = s.SetVerticalEdgeStable();
      if (changes.NoChange)
      {
        if (newQ3.AllDead)
          s = s.SetAllRegionsDead();
        else
          s = s.SetAllRegionsStable();
      }
      else
      {
        s = s.SetQuad3Active();
      }
    }
    else
    {
      s = s.SetVerticalEdgeAndQuadActive();
    }
  }
  else
  {
    s = s.SetAllRegionsActive();
  }
  return s;
}

If you trace all the logic through you’ll see that with only a handful of bit operations we manage to correctly update the eight state bits for the even-cycle Quad3.

We then cut-n-paste this code to do the same thing to the odd cycle, just looking at the opposite edges and corners; obviously I’ll omit that.

Once again we’ve managed to write a whole lot of bit-twiddling code; we now know at the end of every tick whether each of thirty-two regions of a Quad4 had any change from how they were two generations previous.

How are we going to use this information to save time?

Next time on FAIC: We’ll write getters to read out the state we’ve just written, and draw a bunch more diagrams.

Life, part 28

Posted on July 30, 2020 by ericlippert

We now have enough gear to make a naïve “proto-QuickLife” implementation as a test to see (1) does it work at all? and (2) what is the performance compared to our other implementations at various levels of sophistication?

Code for this episode is here.

So far I’ve given you code for Quad2, Quad3, Quad4, the stepping algorithm for a Quad4 on even and odd cycles, and a lookup table to step even and odd Quad2s, so I won’t repeat that. What we need now is code to hold the whole thing together. I’ll omit a bunch of the code that does not relate directly to the task at hand, such as how we draw the screen, how we handle creating the initial pattern, and so on; see the source code link above if those algorithms interest you.

sealed class ProtoQuickLife : ILife, IReport
{
  // Number of 4-quads on a side.
  private const int size = 16;
  private int generation;
  private Dictionary<(short, short), Quad4> quad4s;

For our proto-QuickLife I’m just going to go with the same thing we saw in all of our previous fixed-size implementations: I’ll make an 8-quad. We have a 4-quad data structure in hand, and so we’ll need a 16 x 16 grid of 4-quads.

We need to know whether we are on an odd or even generation, so I’ve made a generation counter.

Since we have a fixed-size grid in this prototype version, I could just have an array of 256 Quad4s. However, in later versions of this algorithm we are going to use the sparse array technique I discussed in a previous episode, but it will be a sparse array of Quad4s, not cells! We’ll index our sparse array by a pair of shorts; that gives us a sparse 20-quad to play with, which is plenty of space; that’s a square with just over a million cells on a side. We might as well write the code for the sparse array now, and save having to write it again later.

public void Clear()
{
  generation = 0;
  quad4s = new Dictionary<(short, short), Quad4>();
  for (int y = 0; y < size; y += 1)
    for (int x = 0; x < size; x += 1)
      AllocateQuad4(x, y);
}

Some easy initialization code that allocates 256 Quad4s and puts them in a sparse array. A Quad4, recall, has six references to neighbouring Quad4s and everything else is a struct, so we will need to initialize those references; the structs will have their default values which, fortunately, is “all dead”.

private Quad4 AllocateQuad4(int x, int y)
{
  Quad4 c = new Quad4(x, y);
  c.S = GetQuad4(x, y - 1);
  if (c.S != null)
    c.S.N = c;
  ... and so on for N, E, W, SE, NW ...
  SetQuad4(x, y, c);
  return c;
}
private Quad4 GetQuad4(int x, int y)
{
  quad4s.TryGetValue(((short)x, (short)y), out var q);
  return q;
}
private void SetQuad4(int x, int y, Quad4 q) => 
  quad4s[((short)x, (short)y)] = q;

On all code paths to these private methods we will already know that the x, y coordinates are in bounds, so we have no problem casting them to shorts. I suppose there is some possibility that on the edges we will have “wrap around” behaviour for the additions and subtractions, but I’m not going to worry about it for the purposes of this blog.

And finally, the mainspring that drives the whole thing:

private bool IsOdd => (generation & 0x1) != 0;
public void Step()
{
  if (IsOdd)
    StepOdd();
  else
    StepEven();
  generation += 1;
}
private void StepEven()
{
  foreach (Quad4 q in quad4s.Values)
    q.StepEven();
}
private void StepOdd()
{
  foreach (Quad4 q in quad4s.Values)
    q.StepOdd();
}

Well that was all easy — as it should be, since the complicated work right now is in Quad4. (Don’t worry; this class will get much more complicated in coming episodes as we add optimizations.)

What do you think the time performance of this initial prototype is? Remember, this is a fully naïve implementation in the sense that we are not doing any kind of change tracking, we are not identifying “all dead” Quad4s and skipping them, and so on. We have a 256×256 grid and we are computing the next generation by looking at every one of those cells; the optimization we have with our lookup table is that we compute the results “four at a time” via lookup to get a 1-quad rather than by counting neighbours.

To get an apples-to-apples comparison I ran the proto-QuickLife algorithm using the same test as we’ve done so far in this series: 5000 ticks of “acorn” on an 8-quad. Make a guess, and then scroll down for the results.

Algorithm           time(ms)  ticks  size(quad)    megacells/s
Naïve (Optimized):   4000       5K      8               82
Abrash (Original)     550       5K      8              596
Stafford              180       5K      8             1820
Proto-QuickLife       770       5K      8              426

A considerable improvement over the original naïve algorithm, but unsurprisingly, not as fast as our more optimized solutions.

What about memory? In this prototype implementation we have 65536 cells, and we are storing two generations. How many bits are we using? (We will ignore fixed costs such as the lookup tables.) If we suppose that references are 8 bytes then:

A Quad2 is exactly 2 bytes
A Quad3 is exactly 8 bytes
A Quad4 has eight Quad3s, so 64 bytes for the data, plus 4 bytes for the coordinates, plus 48 bytes for the references, plus another 4 bytes for the object header maintained by the runtime. That’s 120 bytes per Quad4. Oh, and our sparse array has an overhead of at minimum 12 bytes per entry, since an entry is two shorts and a reference, so call it 132 bytes all up.
We’ve got 256 Quad4s in this implementation, so that’s 33792 bytes to represent 65536 cells, or just slightly over four bits per cell.

That’s not bad; recall that Stafford’s algorithm was 5 bits per cell.

We are off to a good start here.

Coming up on FAIC:

We’ve created a solid foundation for adding more optimizations and features. In the next few episodes we will explore these questions:

As we have seen several times already in this series, if we identify what cells are changing then we can spend time on only those cells. Can we use the fact that we are storing two generations worth of cells in each Quad4 to make an even better change-tracking optimization?
Even if we eliminate the time cost, an all-dead Quad4 with all-dead neighbours seems like it is taking up space unnecessarily. Can we efficiently prune all-dead Quad4s from the sparse array and save on that space?
If we have robust change tracking then we can discover changing cells that border a “missing” Quad4. Can we grow the set of Quad4s dynamically as a pattern expands into new space, and thereby achieve Life on an 20-quad?

Life, part 27

Posted on July 27, 2020 by ericlippert

We’re continuing with our deep dive into Alan Hensel’s QuickLife algorithm, rewritten in C# with an emphasis on clarity. When last we left off we had the following established:

A Quad2 is an immutable wrapper around a ushort
A Quad3 is an immutable struct containing four Quad2s
A Quad4 is a mutable class containing eight Quad3’s; four for the even generations and four for the odd, plus references to the Quad4s to the north, south, east, west, northwest and southeast.
The odd generation is “one cell to the southeast” of the even generation.
We have an algorithm that efficiently steps the even generation (blue) to the odd generation (green):

How then are we going to step the odd generation back to the even generation? If stepping the even generation produces an odd generation that is “one to the southeast” then stepping the odd generation in the same way will also go “one to the southeast”.

Long-time reader svick pointed out in a comment to the previous episode that maybe that’s not so bad! If our coordinate system has to shift “one to the southeast” every step then we could do things like adding new Quad4s to the north and west edges every sixteen ticks, and remove old ones from the south and east edges, and on average we would stay “in the same place”.

I would love to experiment with that idea, but I want to stay on target more. The QuickLife algorithm uses the fact that we alternate between “one to the southeast” and “one to the northwest” to attain optimizations that we will see in later episodes.

Plainly then what we must do is write the equivalent algorithm that goes “one to the northwest.” Let’s attack the problem just on the Quad3 I’ve labeled “this.oddSE” to start with.

Two episodes ago we made a method that takes nine Quad2s and returns a Quad3. I’m going to rename that method to Step9Quad2ToQuad3Even because it only works on evens. I won’t go through the same level of exegesis that I did the first time around as we write…

static Quad3 Step9Quad2ToQuad3Odd(
  Quad2 nw, Quad2 n, Quad2 ne,
  Quad2 w, Quad2 c, Quad2 e,
  Quad2 sw, Quad2 s, Quad2 se)
{

What information do we need to get a Quad3 that is one tick ahead and one cell to the northwest?

We can get a quarter of the information we need by looking up the next generation centers of the center, east, south and southeast 2-quads; the cells for the next even generation that we can compute from the green 2-quads is show by blue 1-quads:

The blue 1-quads will be the southeast corners of the four next-generation 2-quads.

We can get another quarter by getting these “mirrored” Quad2s:

  Quad2 c_e = c.HorizontalMiddleMirrored(e);
  Quad2 s_e = s.HorizontalMiddleMirrored(se);
  Quad2 c_w = w.HorizontalMiddleMirrored(c);
  Quad2 s_w = sw.HorizontalMiddleMirrored(s);

and then use a lookup table to get this state:

That gives us the southwest corners of the four next-generation Quad2s.

We can get these “flipped” Quad2s:

  Quad2 c_s = c.VerticalMiddleFlipped(s);
  Quad2 e_s = e.VerticalMiddleFlipped(se);
  Quad2 c_n = n.VerticalMiddleFlipped(c);
  Quad2 e_n = ne.VerticalMiddleFlipped(e);

to get this state:

for the northeast corners.

And finally we can get these “mirrored and flipped” Quad2s:

  Quad2 c_ne = c_n.HorizontalMiddleMirrored(e_n);
  Quad2 c_sw = c_w.VerticalMiddleFlipped(s_w);
  Quad2 c_se = c_s.HorizontalMiddleMirrored(e_s);
  Quad2 c_nw = c_w.NorthEdge | c_n.SW | nw.SE;

to get this state:

for the northwest corners of each new Quad2.

Putting it together, we can get this information:

As desired, the next step is a Quad3 “one to the northwest” from the Quad3 formed by the center, east, south and southeast Quad2s.

We haven’t yet written the code to actually do the lookups and generate the four Quad2s, but we can just use the same lookup tables as last time…

Wait a minute. No, we cannot!

Remember how we arranged our lookup table? We noticed the following facts for the even-to-odd cycle, which we then made use of in designing the lookup table:

The next-tick 1-quads that go in the NW corner of a Quad2 are generated by stepping a “normal” Quad2.
The 1-quads that go in the SW corner are generated by stepping a “flipped” Quad2.
The 1-quads that go in the NE corner are generated by stepping a “mirrored” Quad2.
The 1-quads that go in the SE corner are generated by stepping a flipped and mirrored Quad2.

Are any of those conditions still met? Sadly no. If you look carefully you’ll see that we are now in the opposite situation. On the odd-to-even cycle:

The next-tick 1-quads that go in the SE corner of a Quad2 are generated by stepping a “normal” Quad2.
The 1-quads that go in the NE corner are generated by stepping a “flipped” Quad2.
The 1-quads that go in the SW corner are generated by stepping a “mirrored” Quad2.
The 1-quads that go in the NW corner are generated by stepping a flipped and mirrored Quad2.

What are we going to do about this? We could solve the problem by doing a bunch of bit shifting, but remember that the whole point of our technique of storing the “next generation” 1-quads in a lookup table at the bit positions where they are going to be needed is to avoid that bit shifting by moving it back in time to the generation of the table.

I’m sure you see the obvious solution: build a second lookup table that has the properties that we want! It’s only an additional 128K of memory.

Something we will see as we continue porting the QuickLife algorithm to C# is that the whole thing is based on the even and odd cycles being almost the same but different enough that you end up writing all the code twice. There will be a lot of duplicated code in my implementation, and my implementation will have quite a bit less duplication than the original.

So we’ll make two lookup tables, one for even and one for odds; I’ll omit the generation of the odd lookup table because it is just like the even lookup table, just with the results put in different bit positions. Our function for today then ends just as you’d expect:

  Quad2 newNW = OddLookup(c).SE | OddLookup(c_n).NE | OddLookup(c_w).SW | OddLookup(c_nw).NW;
  Quad2 newNE = OddLookup(e).SE | OddLookup(e_n).NE | OddLookup(c_e).SW | OddLookup(c_ne).NW;
  Quad2 newSW = OddLookup(s).SE | OddLookup(c_s).NE | OddLookup(s_w).SW | OddLookup(c_sw).NW;
  Quad2 newSE = OddLookup(se).SE | OddLookup(e_s).NE | OddLookup(s_e).SW | OddLookup(c_se).NW;
  return new Quad3(newNW, newNE, newSW, newSE);
}

That takes care of the problem of “get the next Quad3 given nine Quad2s”. Last time we described how to use that helper function to step an even-to-odd Quad4, and again it should come as no surprise that odd-to-even is just the same except slightly different:

public void StepOdd()
{
  StepOddSE();
  StepOddNW();
  StepOddSW();
  StepOddNE();
}
private void StepOddSE()
{
  evenSE = Step9Quad2ToQuad3Odd(
    oddNW.SE, oddNE.SW, oddNE.SE,
    oddSW.NE, oddSE.NW, oddSE.NE,
    oddSW.SE, oddSE.SW, oddSE.SE);
}
private void StepOddNW()
{
  evenNW = Step9Quad2ToQuad3Odd(
    NW == null ? AllDead : NW.oddSE.SE,
    N == null ? AllDead : N.oddSW.SW,
    N == null ? AllDead : N.oddSW.SE,
    W == null ? AllDead : W.oddNE.NE,
    oddNW.NW,
    oddNW.NE,
    W == null ? AllDead : W.oddNE.SE,
    oddNW.SW,
    oddNW.SE);
}
...

And so on; I won’t write them all out here.

Once again I have four methods that have just one method call in them each because in future episodes we will be adding more stuff here.

Next time on FAIC: We now have enough gear that we can write a simple working “proto-QuickLife” algorithm that steps a fixed set of Quad4s. We’ll do that, take a look at its performance, and then think about ways to build a much faster, much more memory-efficient, and much more dynamic implementation on top of the foundation we’ve built so far.

For today’s extra content:

We’ve already seen the four smallest spaceships (glider, and the three variants of light/middle/heavy spaceship). “Loafer” is the fifth smallest spaceship, and moves at a relatively leisurely c/7. It’s called loafer because of this slow pace and because it appears to be pushing a loaf still Life ahead of it. (Once again for reasons unknown WordPress is not serving up the animation, so see the wiki for the animation or click here.)

Life, part 26

Posted on July 23, 2020 by ericlippert

Last time on FAIC we saw how we could start with nine 2-quads — a 12×12 grid of cells — and with only a handful of ands, ors and table lookups we could get an 8×8 grid of cells of the next state out. How is that useful?

In order to explain, we need two more data structures, and it should come as no surprise that they are Quad3 and Quad4. Quad3, as I mentioned last time, is trivial; it is just a struct wrapping four Quad2s. The implementation is simpler than Quad2:

struct Quad3
{
  public Quad3(Quad2 nw, Quad2 ne, Quad2 sw, Quad2 se)
  {
    NW = nw;
    NE = ne;
    SW = sw;
    SE = se;
  }
  public Quad2 NW { get; }
  public Quad2 NE { get; }
  public Quad2 SW { get; }
  public Quad2 SE { get; }
  public bool Get(int x, int y) { ... }
  public Quad3 Set(int x, int y) { ... }
  public Quad3 Clear(int x, int y) { ... }
  public override string ToString() { ... }
}

We have yet another immutable struct. It’s small — only 64 bits. The Get, Set, Clear and ToString implementations are the obvious and straightforward extensions of the similar methods I already showed for Quad2, so I won’t bother showing them here. In a few episodes we will need to add some helper methods to Quad3, but for now, this will do us just fine.

The interesting data structure is Quad4, which by rights should represent a 16×16 grid of cells. The Quad4 has some surprises:

A Quad4 is a mutable class, not an immutable struct.
A Quad4 is not four Quad3s. It is eight! A Quad4 represents two ticks worth of a 16×16 portion of the grid. We have four Quad3s for the even-numbered generations, and four for the odds.
Each Quad4 is numbered with a unique (x, y) pair of signed short integers. That gives us a square of Quad4s 65536 on a side; since each Quad4 is 16 cells wide and tall, we have a total grid size of just over one million cells on a side; that’s a 20-quad. That’s pretty big. Of course, we will typically allocate nowhere even close to that many Quad4s.
Each Quad4 has a reference to six neighbouring Quad4s; specifically, it knows what Quad4s are to its north, south, west, east, northwest and southeast. A null reference is to be treated as though those cells are all dead. (Why not northeast and southwest? We shall see.)

There are more surprises to come later. Let’s write some code. It’s pretty straightforward:

sealed class Quad4
{
  public Quad4(int x, int y)
  {
    X = (short)x;
    Y = (short)y;
  }

  // A unique pair of shorts for each 4-quad in a 20-quad.
  public short X { get; }
  public short Y { get; }

  // References to neighbouring Quad4s:
  public Quad4 N { get; set; }
  public Quad4 S { get; set; }
  public Quad4 E { get; set; }
  public Quad4 W { get; set; }
  public Quad4 NW { get; set; }
  public Quad4 SW { get; set; }

  // Cells
  private Quad3 evenNW;
  private Quad3 evenSW;
  private Quad3 evenNE;
  private Quad3 evenSE;
  private Quad3 oddNW;
  private Quad3 oddSW;
  private Quad3 oddNE;
  private Quad3 oddSE;

  public bool GetEven(int x, int y) { ... }
  public void SetEven(int x, int y) { ... }
  public void ClearEven(int x, int y) { ... }

  public bool GetOdd(int x, int y) { ... }
  public void SetOdd(int x, int y) { ... }
  public void ClearOdd(int x, int y) { ... }

  public string override ToString() { ... }
}

Once again those last seven methods are (for now!) just trivial rewrites of the equivalent methods on a Quad2, so I’ll omit them here.

Obviously some controlling code representing the entire grid needs to create some number of Quad4s, assign numbers to them, set up the references, and so on. We’ll write that code in a later episode; right now I want to continue concentrating on the key problem of stepping to the next tick.

Suppose we are currently on an even generation and we want to step all the Quad4s in the system to the next generation, which will of course be odd. We add this method to Quad4 and have the controlling code call it for every Quad4:

public void StepEven()
{
  StepEvenNW();
  StepEvenSW();
  StepEvenNE();
  StepEvenSE();
}

Pretty straightforward; we step every even Quad3 forward one tick, and presumably that fills in the odd Quad3s. How are we going to do that though?

Time for more diagrams.

Again I’ve just randomly chosen a fragment of “acorn”, this time from generation 416. Let’s suppose the 4-quad I’ve marked as “this” is the one we are attempting to step forward one generation. (Click on any image for a higher-res version if one is available.)

Let’s zoom in a bit on “this” and give expressions that refer to some interesting Quad2s:

Well would you look at what we have here: nine Quad2s. If we step those forward one step with our handy super-efficient method from last episode, we get the next state of the green Quad3:

The code could not be easier:

private void StepEvenNW()
{
  oddNW = Step9Quad2ToQuad3(
    evenNW.NW, evenNW.NE, evenNE.NW,
    evenNW.SW, evenNW.SE, evenNE.SW,
    evenSW.NW, evenSW.NE, evenSE.NW);
}

I’m going to keep this in its own method though, as I’ll be adding more gear to it in an upcoming episode.

That’s the easy one, and I don’t think I need to labour the point with more diagrams. In order to step forward the other Quad3s we need to be careful about the fact that this.E, this.SE and this.S could be null, but if they are, we just substitute in an all-dead Quad2, which we helpfully made a static field for last time:

private void StepEvenSW()
{
  oddSW = Step9Quad2ToQuad3(
    evenSW.NW, evenSW.NE, evenSE.NW,
    evenSW.SW, evenSW.SE, evenSE.SW,
    S == null ? AllDead : S.evenNW.NW,
    S == null ? AllDead : S.evenNW.NE,
    S == null ? AllDead : S.evenNE.NW);
}
private void StepEvenNE()
{
  oddNE = Step9Quad2ToQuad3(
    evenNE.NW, evenNE.NE, E == null ? AllDead : E.evenNW.NW,
    evenNE.SW, evenNE.SE, E == null ? AllDead : E.evenNW.SW,
    evenSE.NW, evenSE.NE, E == null ? AllDead : E.evenSW.NW);
}
private void StepEvenSE()
{
  oddSE = Step9Quad2ToQuad3(
    evenSE.NW, evenSE.NE, E == null ? AllDead : E.evenSW.NW,
    evenSE.SW, evenSE.SE, E == null ? AllDead : E.evenSW.SW,
    S == null ? AllDead : S.evenNE.NW,
    S == null ? AllDead : S.evenNE.NE,
    SE == null ? AllDead : SE.evenNW.NW);
}

And when StepEven returns, we have the next state for the odd generation:

Wait a minute. We have committed the classic programmer blunder of being consistently off by one! We haven’t computed the next generation of the even 4-quad; we’ve computed the next generation of the 4-quad that is the even 4-quad shifted one cell to the southeast.

Well golly.

What are we going to do about this?

Give that some thought and then scroll down.

In my youth I was in a group of friends that watched terrible 80s horror movies every week, and we noticed patterns. In particular, in Killer Mutant Critter movies and zombie movies there is often a sheriff, and there are often concerned townsfolk, and the dialog seems to go

“Sheriff! Giant mutant toads are attacking the village! What are you going to do about it?!”

“Nuthin’.”

That’s what we’re going to do about it. Nuthin’.

This is our final surprise for this episode. A Quad4 represents a 16×16 section of the grid, for two generations, where the odd numbered generation is offset by one cell to the southeast from the even generation. Though this will make life very slightly difficult for the display code, this “staggered generations” property turns out to be surprisingly useful; we will see why in an upcoming episode.

Next time on FAIC: The next problem we need to solve is: how are we going to do the odd generation steps? We can’t just do the same thing because then we’ll be two cells off. Somehow we need to have the odd step get back to the northwest.

Life, part 25

Posted on July 20, 2020 by ericlippert

Let’s crisp up the problem from last episode. The problem for today’s episode is to write a method that takes these nine 2-quads:

Computes these sixteen 1-quads:

And returns these four 2-quads:

Those four 2-quads form a 3-quad; I haven’t written a 3-quad data structure yet, but I’m sure you can imagine how it goes. We’ll write it out later; it’s just a struct containing four 2-quads. I like to start by writing the signature of the method:

static Quad3 Step9Quad2ToQuad3(
  Quad2 nw, Quad2 n, Quad2 ne,
  Quad2 w, Quad2 c, Quad2 e,
  Quad2 sw, Quad2 s, Quad2 se)
{

The first step is to get those twelve additional 2-quads. The first four we need are these:

Start at the beginning: how are we going to get the western 2-quad on the northern edge? Plainly it is made up of parts of the 2-quads I’ve labeled NW and N. A couple episodes ago I added an “or” operator to 2-quads and masks that get the eastern and western edges, so how about this?

Quad2 n_w = nw.EastEdge | n.WestEdge;

Do you see why that’s not quite right? That is this 2-quad:

We’ve pasted the east edge of nw and the west edge of n together, but we also need to swap eastness with westness. The result we’ve got is mirror-imaged-at-the-level-of-1-quads the 2-quad that we want. Make sure that this step is clear in your head because correctly understanding how this is “mirrored” is important in this algorithm.

We’re going to be doing these operations a lot in this episode, so in keeping with my general attitude of “there’s nothing too small to go in a method”, I’m going to make two helper methods on Quad2 right now; one creates a problem and one fixes it:

public Quad2 HorizontalMiddleMirrored(Quad2 right) =>
  EastEdge | right.WestEdge;
public Quad2 Mirror =>
  new Quad2((ushort)((cells << 8) | (cells >> 8)));

Now we can write correct code for extracting the n_w 2-quad:

Quad2 n_w = nw.HorizontalMiddleMirrored(n).Mirror;

Why not combine these into one helper function HorizontalMiddle? Normally I would, but… we will come back to this point in a bit.

So far we’ve done two ands, two ors and two shifts. We can do three times that work to similarly get:

Quad2 n_e = n.HorizontalMiddleMirrored(ne).Mirror;
Quad2 c_w = w.HorizontalMiddleMirrored(c).Mirror;
Quad2 c_e = c.HorizontalMiddleMirrored(e).Mirror;

We now have eight of the sixteen 2-quads that we need. I’m sure you see how we can get these four:

We need to do the same thing again but this time for the vertical direction. And this time the 2-quads are going to come out flipped north-for-south instead of mirrored east-for-west. Two more helper methods!

public Quad2 VerticalMiddleFlipped(Quad2 bottom) =>
  SouthEdge | bottom.NorthEdge;
public Quad2 Flip => 
  new Quad2((ushort)(((cells & SouthEdgeMask) << 4) | 
    ((cells & NorthEdgeMask) >> 4)));

and now we can compute those four:

Quad2 w_n = nw.VerticalMiddleFlipped(w).Flip;
Quad2 w_s = w.VerticalMiddleFlipped(sw).Flip;
Quad2 c_n = n.VerticalMiddleFlipped(center).Flip;
Quad2 c_s = center.VerticalMiddleFlipped(s).Flip;

We’re 75% of the way to getting our sixteen needed 2-quads. How are we going to get these?

Three of them are pretty easy; they can be constructed from what we already have in hand.

Quad2 c_nw = w_n.HorizontalMiddleMirrored(c_n).Mirror;
Quad2 c_sw = w_s_flip.HorizontalMiddleMirrored(c_s).Mirror;
Quad2 c_ne = n_e.VerticalMiddleFlipped(c_e).Flip;

That last one is a little tricky but we can get it:

Quad2 s_e = s.HorizontalMiddleMirrored(se).Mirror;
Quad2 c_se = c_e.VerticalMiddleFlipped(s_e).Flip;

So far we’ve done 26 each of and, or, shift, to get the 16 2-quads we need.

We supposed that we had a fast lookup table that took a 2-quad and returned a 1-quad; let’s actually build that. The 2-quad key is a ushort and the returned 1-quad is an integer where the low four bits are the bits of the 1-quad:

static int[] lookup = new int[65536];
... initialized as ...
for (int i = 0; i <= 0xffff; i += 1)
  lookup[i] = StepQuad2(new Quad2((ushort)i));

We can therefore solve our problem with this mess:

Quad2 newNW = new Quad2((ushort)(
  (lookup[(ushort)nw] << 12) | // NW is bits 12-15
  (lookup[(ushort)w_n] << 8) | // SW is 8-11
  (lookup[(ushort)n_w] << 4) | // NE is 4-7
  (lookup[(ushort)c_nw])));    // SE is 0-3
Quad2 newNE = new Quad2((ushort)(
  (lookup[(ushort)n] << 12) |
  (lookup[(ushort)c_n] << 8) |
  (lookup[(ushort)n_e] << 4) |
  (lookup[(ushort)c_ne])));
Quad2 newSW = new Quad2((ushort)(
  (lookup[(ushort)w] << 12) |
  (lookup[(ushort)w_s] << 8) |
  (lookup[(ushort)c_w] << 4) |
  (lookup[(ushort)c_sw])));
Quad2 newSE = new Quad2((ushort)(
  (lookup[(ushort)c] << 12) |
  (lookup[(ushort)c_s] << 8) |
  (lookup[(ushort)c_e] << 4) |
  (lookup[(ushort)c_se])));
return new Quad3(newNW, newNE, newSW, newSE);

What I like about this: it would work.

What I dislike about this: we’re back in the mechanism domain. We are treating 1-quads as ints and constantly going back and forth between ushorts and Quad2s. The implementation detail of how a Quad2 is laid out in memory is once again front and center in this algorithm. Also, it feels like we are doing more work than we need to throughout this algorithm.

And we are wasting a bunch of space in the lookup table. We have 65536 32-bit integers but we are only using 4 of those bits. That seems like a small problem; memory is cheap after all but… wait a minute… what if we could fix several of these problems at once?

The first step in fixing this mess is: we’re going to make the lookup table an array of Quad2s, where each Quad2 contains four copies of the 1-quad:

static Quad2[] lookup = new Quad2[65536];
... initialized as ...
for (int i = 0; i <= 0xffff; i += 1)
{
  int n = StepQuad2(new Quad2((ushort)i));
  lookup[i] = new Quad2((ushort)(
    (n << 12) | (n << 8) | (n << 4) | n));
}
...
static Quad2 Lookup(Quad2 q) => lookup[(ushort)q];

I mean, why not? We have plenty of bits to spare, so use them.

Now the call site looks like this:

Quad2 newNW =  Lookup(nw).NW | Lookup(w_n).SW | Lookup(n_w).NE | Lookup(c_nw).SE;
Quad2 newNE =  Lookup(n).NW  | Lookup(c_n).SW | Lookup(n_e).NE | Lookup(c_ne).SE;
Quad2 newSW =  Lookup(w).NW  | Lookup(w_s).SW | Lookup(c_w).NE | Lookup(c_sw).SE;
Quad2 newSE =  Lookup(c).NW  | Lookup(c_s).SW | Lookup(c_e).NE | Lookup[c_se].SE;
return new Quad3(newNW, newNE, newSW, newSW);

Look at that improvement in the readability of the code! We plainly see that we are constructing new Quad2s out of the parts of old ones. We’re keeping all the typing consistent. This is great.

Could it be even better?

Take a look at those sixteen computations of a quad1 on the next tick. You notice something?

All the ones that are going in the NW of a new 2-quad are lookups on an argument that was passed in to this method: nw, n, w and c.
All the ones that are going to the SW were computed by “flipping” something: w_n, c_n, w_s and c_s.
All the ones that are going to the NE were computed by “mirroring” something: n_w, n_e, c_w and c_e
All the ones going to the SE were both flipped and mirrored: c_nw, c_ne, c_sw, c_se.

What does this mean?

It means we can move the flipping and mirroring operations back in time into the lookup table as well. Read this next bit carefully because it is tricky. We’re going to initialize the lookup table like this:

for (int i = 0; i <= 0xffff; i += 1)
{
  Quad2 normal = new Quad2((ushort)i);
  Quad2 mirror = normal.Mirror;
  Quad2 flip = normal.Flip;
  Quad2 both = mirror.Flip;
  int result = StepQuad2(normal);
  lookup[(ushort)both] |= new Quad2((ushort)result);           // SE
  lookup[(ushort)mirror] |= new Quad2((ushort)(result << 4));  // NE
  lookup[(ushort)flip] |= new Quad2((ushort)(result << 8));    // SW
  lookup[(ushort)normal] |= new Quad2((ushort)(result << 12)); // NW
}

And now we can remove all the flips and mirrors when computing the new Quad2s! Put it all together:

static Quad3 Step9Quad2ToQuad3(
  Quad2 nw, Quad2 n, Quad2 ne,
  Quad2 w, Quad2 c, Quad2 e,
  Quad2 sw, Quad2 s, Quad2 se)
{
  // These are mirrored
  Quad2 n_w = nw.HorizontalMiddleMirrored(n);
  Quad2 n_e = n.HorizontalMiddleMirrored(ne);
  Quad2 c_w = w.HorizontalMiddleMirrored(c);
  Quad2 c_e = c.HorizontalMiddleMirrored(e);
  // These are flipped
  Quad2 w_n = nw.VerticalMiddleFlipped(w);
  Quad2 c_n = n.VerticalMiddleFlipped(c);
  Quad2 w_s = w.VerticalMiddleFlipped(sw);
  Quad2 c_s = c.VerticalMiddleFlipped(s);
  // These are mirrored and flipped
  Quad2 c_nw = w_n.HorizontalMiddleMirrored(c_n);
  Quad2 c_ne = n_e.VerticalMiddleFlipped(c_e);
  Quad2 c_sw = w_s.HorizontalMiddleMirrored(c_s);
  Quad2 c_se = c_e.SouthEdge | c_s.NE | se.NW;
  Quad2 newNW =  Lookup(nw).NW | Lookup(w_n).SW | Lookup(n_w).NE | Lookup(c_nw).SE;
  Quad2 newNE =  Lookup(n).NW  | Lookup(c_n).SW | Lookup(n_e).NE | Lookup(c_ne).SE;
  Quad2 newSW =  Lookup(w).NW  | Lookup(w_s).SW | Lookup(c_w).NE | Lookup(c_sw).SE;
  Quad2 newSE =  Lookup(c).NW  | Lookup(c_s).SW | Lookup(c_e).NE | Lookup[c_se].SE;
  return new Quad3(newNW, newNE, newSW, newSW);
}

Computing c_se actually gets easier, which is a nice bonus.

What’s our final tally? To move ahead an 8×8 portion of a 12×12 grid we do 41 ands, 25 ors and 16 array lookups. And zero shifts. All the other bit operations have been moved backwards in time to run during the computation of the lookup table.

Could we do even less work?

Yes; we could eliminate 16 of those ands if we had four lookup tables each with four bits in the right place and zeros everywhere else; we then would not have to do any masking. It would just be:

  Quad2 newNW =  LookupNW(nw) | LookupSW(w_n) | LookupNE(n_w) | LookupSE(c_nw);
  ...

But that’s not how the original implementation did it, and so that’s not what I’m doing here. It would be an interesting experiment to find out of that makes a difference; maybe we will come back to that point later in the series, but probably not.

Oh, and incidentally, you remember that fragment of bit-twiddling code that I posted from the original implementation?

int ix01 = (ix00 & 0xf0f0) | (cN.q[14] & 0x0f0f);
int ix11 = (ix10 & 0xf0f0) | (cN.q[15] & 0x0f0f);
int ix03 = (ix02 & 0xf0f0) | (ix01 & 0x0f00) | (cN.q[7] & 0x000f);

I bet you can figure out what it does now!

Well that was quite the journey to get a fast method that can step an 8×8 fragment of a 12×12 grid forwards one tick by memoizing the “4×4 step to 2×2” problem. Why is this method useful?

Next time on FAIC: Yeah, why?

Fabulous adventures in coding

Eric Lippert's blog