Code for this episode can be found here. I have not added any more code to the engine, but the client now has two features of great use to me; pressing space toggles whether the simulation is running or paused, and pressing “s” takes a screenshot and saves it to the desktop.

Last time on FAIC I got totally off on tangents about asymptotic performance without answering the question “what is the asymptotic performance of our naive Life algorithm?”

Asymptotic performance, as we’ve discussed, is about finding a function that relates problem size to cost, so the first thing we have to do is measure problem size.

In the episodes so far I’ve been saying a lot of “consider a 32 by 32 grid” or “consider a 256 by 256 grid” and so on. The reason why I want square grids whose sizes are powers of two will become apparent later in this series, but even without the good reason that is coming up, considering grids in these sizes enables a concise way to describe them.

I’m going to refer to a square grid where the side is a power of two as a “quad”. In episode 2 I defined the “scale” of a quad as the power of two that is the length of the side. That is, a quad of “zero scale” is a 1×1 grid, a quad of “one scale” is a 2×2 grid, and so on. In fact, we can probably just omit the “scale” much of the time and just say a “six quad” is a 64×64 grid. Or, put another way, the number of cells in an n-quad is 4^{n}.

Through this series I’m going to use two different ways to consider the size of a grid: either the total number of cells in a grid, or the “quad scale”, which is the base-4 log of the cell count. This might seem a little silly now when we’re considering 256×256 grids — 8-quads! — but it will make more sense later, I promise.

What then are the costs we should consider when evaluating the asymptotic performance of an algorithm? Typically there are two: memory usage, and time to compute the new state after one tick. (Later on in this series we’ll consider a more sophisticated way to represent number of elapsed ticks that is analogous to the way we represent grid size by quad scale, but we won’t need that for a while yet.)

We are specifically *not* looking to measure the time spent drawing to the screen. Different algorithms have different raw performance when drawing to the screen, of course, but for the most part, because of the way I’ve structured the interface between the client and the engine, the cost is *linear in the number of live cells in the portion of the board being drawn*. That said, I will point out performance issues that affect screen drawing time.

Without further ado, let’s look at the performance of our algorithm. Let’s suppose the grid is square and has a total of n cells in it; that is, n is width times height.

bool[,] newCells = new bool[width, height]; // O(n) for (int y = 0; y < height; y += 1) // O(h) iterations for (int x = 0; x < width; x += 1) // O(n) iterations total { // O(n) total calls of constant cost each: int count = LivingNeighbors(x, y); // O(n) total operations of constant cost each: newCells[x, y] = count == 3 || (cells[x, y] && count == 2); } cells = newCells; // O(1)

- Constructing a new array is O(1) but initializing the n bytes in the array to zero is O(n).
- In the outer loop we initialize y once. We compare y against height and increment y “height” times.
- In the inner loop we initialize x “height” times, and we compare and increment it n times.
- Each call to LivingNeighbors calls IsValidPoint eight times, which does some comparisons, and then we do some array accesses. Each one of these operations is O(1) but we do them once per cell, so the total cost is O(n).
- The comparisons and assignment are each O(1) but we do them n times, so the total cost is O(n)
- Assigning an array reference to a field is O(1) no matter the size of the array.

Add it all up. We have a bunch of O(1) costs, we have a bunch of O(n) costs, and we have an O(height) cost. But we’re assuming that height is the square root of n, so what we have is:

O(1) + O(√n) + O(n)

There are two important conventions in asymptotic analysis — and are the reason it is “asymptotic”. First, we ignore constant multipliers — two O(1) costs are treated as a single O(1) cost. And second, we can ignore “small costs” when there are large costs.

Here we see that cost is growing linearly with problem size, and the smaller components due to the constant-cost factors and the square-root-cost factors will be dominated by the linear factors as the program size increases. We therefore summarize this as an O(n) algorithm, where again, n is the number of cells.

If we were giving the cost of this algorithm as a function of quad scale, then it would be an O(4^{s}) algorithm, which should make sense; the number of cells grows extremely rapidly as we scale up a quad, and our cost is linear in the number of cells.

What about the drawing algorithm?

long xmin = Max(0, rect.X); long xmax = Min(width, rect.X + rect.Width); long ymin = Max(0, rect.Y - rect.Height + 1); long ymax = Min(height, rect.Y + 1); for (long y = ymin; y < ymax; y += 1) for (long x = xmin; x < xmax; x += 1) if (cells[x, y]) setPixel(new LifePoint(x, y));

The best case is that the rectangle we are asked to draw is outside the fixed size of our grid; in that case it is O(1). But the typical case is that we are asked to draw the entire grid; we do one array lookup per drawn grid cell, and so if we have n grid cells drawn, that again O(n). But be careful! Here I’m using n to mean “number of cells drawn” and not “number of cells computed by our algorithm”. Context matters in big-O analysis.

Is having time spent per tick be O(n) in the number of cells to compute one generation *good*?

Yes, this is a pretty good start. We know that even with a very naïve algorithm, we can assume that making the grid four times bigger will make it take about four times longer to compute, and computing ten times as many ticks will take about ten times longer to compute.

That’s good to know, because now we can do measurements to see how long it takes to compute one grid of known size, and get a sense of how big we can make the grid before the slowdown becomes unacceptable. Or, conversely, if we’re interested in the question “how many ticks can I process in my time budget?” we now have the ability to estimate what the grid scale should be to achieve that goal given a measurement.

So far I’ve only talked about consumption of time; what about memory? It seems easy; a bool is one byte, we spend one byte per cell, and so we have O(n) bytes in the original cell array, and we allocate O(n) bytes in the new array. At the end of the algorithm we throw away the old one and keep the new one, so we’ve got a total memory consumption of O(n).

Unfortunately though, that’s not the whole story. Memory allocation is not free; the more times you allocate memory, the more work the garbage collector has to do in the future, and that work does not come for free. Allocation of memory can affect time in unusual ways in GC’d languages, and can cause some linear algorithms to become quadratic as a result; I may discuss this in more detail in a later episode.

A better approach here would be to allocate two cell buffers once, and then rotate between them, switching which one is “current cells” and which one is “next step cells”. Sure, that doubles the amount of memory used by the program for the cells, but it lowers the burden on the garbage collector. And we will see later in this series that memory burden can get hard to analyze particularly when you care about GC issues.

But remember: asymptotic performance only tells us how performance will likely change as problem size changes. It does not tell us whether our algorithm actually meets our performance budget at any reasonable size!

**Next time on FAIC:** How fast is this algorithm? Where are we doing unnecessary work that takes time?