Life, part 4

Code for this episode can be found here. I have not updated the Life algorithm, but I have added a new feature to the client, namely, you can now resize the form and the display box will resize along with it.

On that subject — as I noted earlier, the client is very “bare bones” right now, and I intend to gradually add more features to it as we go, but I’ll mostly concentrate on the algorithms. I’ve started getting pull requests on GitHub adding more features to the client, and though I certainly appreciate the effort and would like to thank everyone for their interest, it’s not actually my intention here to create a fully-featured Life simulator. There are plenty available already if that’s what you want.

Please do clone the repo and play around with it! But I write this blog in my spare time and have many other demands on my time even in the pandemic, so doing extra work to review PRs and figure out how to make the code I’ve already written for future episodes work with them is work I might not get to for some time. Thanks!

Last time I presented the naïve implementation of the Life algorithm, where we restricted the board to 256×256 cells and implemented it as a two-dimensional array of bools. What are the relevant performance characteristics of this solution? There are a lot of different ways we can characterize the performance of an algorithm, and I’d like to look at several of them in this series as we discuss the different algorithms.

Today let’s start with asymptotic performance, also known as “big-O” performance. That is, how does some performance factor of interest change as the size of the problem we’re solving changes? But before we get into the details, I want to digress for a moment — three times.

First digression: what do we mean by “big-O” performance? If you’re unfamiliar with this term, the basic idea is that the cost of solving a problem is some function of the size of the problem. By “cost” I mean that there is something that is in short supply that you care about, like time, or memory, or storage space, or less commonly, actual dollars. (Though if you are buying compute power from a cloud provider, time and storage really are measured in dollars!)

The question is: what is that function?

If the cost of solving the problem is the same no matter what the size of the problem is, that is said to be a “constant order” problem, and we notate that O(1). That is, if we spend one unit of time to solve a problem of size 100, then we also spend one unit of time to solve a problem of size 100000.

Example: “I have a jar that contains zero or more pennies and nothing else; are there any pennies in this jar?” You can answer that question in the same amount of time whether there are zero pennies, one penny, or a million pennies in the jar. The cost does not scale at all with the problem size; it is always the same.

Of course most problems are not of constant order!

Example: “I have a deck of cards; each card has a number on it; the deck is unsorted. Is there a card with the number 12 on it?” The only way to find out is to look at the cards until you find a 12, in which case you’re done, or you’ve looked at every card. If you double the number of cards, you double the average search time. If you triple the number of cards, you triple the search time. We say that the problem scales linearly, and notate it as O(n). That is, if we spend one unit of time to solve a problem of size 20, then we spend ten units of time to solve a problem of size 200.

Similarly we have problems that are “quadratic”, or O(n2), where making the problem ten times bigger makes it 100 times more expensive to solve, or the really bad “exponential”, O(2n), where making the problem one bigger makes it twice as hard to solve.

Asymptotic analysis is in particular not concerned with the problem of “could we make this algorithm 10% faster for this particular case?” (We’ll discuss that kind of analysis soon.) Rather, it is solely concerned with the question of “how does performance change as a function of problem size?”

That’s a brief overview; there is much, much more to be said on this topic, including why it is called “asymptotic performance”, which is not at all obvious. I’ll leave it there; this should be enough to get us going. If this subject is new to you, there are plenty of resources to learn more on the internet.

Second digression: Why do we care about asymptotic performance in general?

There is a great deal to criticize about the way software companies conduct interviews; a critique I frequently hear is “interviews quiz you about big-O performance analysis but that’s not a skill you use on the job”.

A lot of companies blindly follow the interviewing practices of industry giants like Microsoft or Google without thinking hard enough about whether those practices make sense for their line of business. In companies where this genuinely is not an on-the-job skill, quizzing interview candidates about algorithmic performance analysis is not just a pointless and expensive acquisition of signal which is then either irrelevant or ignored, but worse, it is is gatekeeping.

Gatekeeping how? What I mean is: those sorts of questions can effectively be mere proxies for the question “did you pay attention in your second year computer science classes?” Asking that question unnecessarily introduces a strong bias against hiring people who never took second year computer science but who are otherwise capable of doing the job. That’s a totally valid reason to criticize this kind of question; it’s unwise to ask questions that are both genuinely irrelevant to the job and exclude potentially valuable talent!

When then is it appropriate to ask these sorts of questions in interviews?

I ask questions about asymptotic performance when I interview candidates, and that is because you will not be successful working in my codebase unless you can do this kind of analysis quickly and accurately. I do not expect candidates to know the Master Theorem by heart — I sure don’t! — but estimating the big-O performance of my work is a skill I have used every single day for over two decades.

Why’s that?

Because asymptotic performance tells you how the performance of your small test cases will scale to real problems. I work on problems where the problem sizes can realistically grow large, and the algorithms are not always obviously linear.

Let’s take an abstract example. Suppose we provide a software service to some customers who are throwing problems at our servers. Let’s say that we want our test case to take about a minute, and in that minute we can solve a problem of size 100.

If the time taken to solve real customer problems instead of little test case problems scales linearly then:

  • A problem of size 1 000 takes ten minutes.
  • A problem of size 10 000 takes about two hours.
  • A problem of size 100 000 takes about seventeen hours.
  • A problem of size 1 000 000 takes about seven days.

Now suppose the problem scales quadratically:

  • A problem of size 1 000 takes about two hours.
  • A problem of size 10 000 takes about seven days.
  • A problem of size 100 000 takes about two years.
  • A problem of size 1 000 000 takes about 200 years.

If the cost of solving a problem grows quadratically then the largest real-world problem you can realistically solve is only about a thousand times bigger than your typical couple-minutes test case.

Or, put another way:

If the asymptotic cost of an algorithm grows at any rate more than O(n log n) — which is the rate at which “sort this data set” problems grow — then there will very likely be a point early on where a realistically-sized problem is too expensive to solve with this algorithm.

Right now I am doing research on analyzing probabilistic models represented as graphs. My test cases typically have a few dozen nodes and take a couple seconds to run. What is the asymptotic performance of my analysis? If the answer is O(n2) then I have a big problem that needs to be solved before I make any promises to a customer, because I have apparently built an analyzer that can only analyze “toy” problems! I know that customers are going to want to analyze graphs that have millions of nodes in them, which means that I need to scale at O(n log n) or better.

That’s why we care so much about big-O analysis; we want to make software that scales to solve real customer problems without costing too much.

Third digression: Why do we care in this particular scenario?

We care because we’re trying to simulate an infinite grid. Plainly we are not going to actually implement anything of infinite size, so it makes sense to ask “if we know how much it costs in time or memory to simulate a 256 by 256 grid, how much does it cost to simulate a four-times-larger 512 by 512 grid? What about a 16-times-larger 1024 by 1024 grid?” The number of cells in a grid grows as the square, so it is very difficult to find Life algorithms that scale well when the factor that is increasing is the width of a square grid.

By knowing the asymptotic order of the growth of the costs we can very quickly get a sense of what our “budget” is. If, say, we want to compute a new tick fast enough that it looks like a smooth animation, that means we have only about 30 milliseconds to do the computation. If we know that we can do, say, 256 by 256 in two milliseconds, and we know that our algorithm is linear in the number of cells, then we can very quickly know that we will be beyond our performance budget once the graph is sixteen times larger, which is only 1024 by 1024.

Look at that, I’ve used up an entire episode in digressions! Whoops!

Next time on FAIC: How should we measure asymptotic order for our naive algorithm? And what other performance concerns might we consider?

10 thoughts on “Life, part 4

  1. I believe you have a typo in your second (quadratic) table. It reads:

    – A problem of size 1 000 takes about two hours.
    – A problem of size 100 000 takes about seventeen hours.
    – A problem of size 100 000 takes about two years.
    – A problem of size 1 000 000 takes about 200 years.

    I believe the second bullet point is off and should be:

    – A problem of size 10 000 takes about seventeen hours.

  2. Yes, doing light quizzing about asymptotic analysis is appropriate for a job interview – it’s good to know that the candidate can describe the problem and can show that he/she’s done the analysis before. However, I’m with you that it can become a stupid gatekeeping mechanism. Before the first interview for a job I applied for a few years ago, I was given a timed test administered by one of those on-line candidate-sorting/testing companies.

    The trivial solution to the problem was O(n2). I got that working in no time, but was told that the solution was “too slow”. So I sat back and realized that their was a solution that was O(n) if the input data was sorted, so the problem became O(n log(n)) + O(n) == O(n log(n)). So, I implemented that and was told that it was still too slow.

    At that point, there were 5 minutes left in a 20 or 30 minute test, so I gave up and looked on the web. It turns out that there is an O(n) algorithm that solved the problem – but it was some weird, non-obvious, back-solving thing. There’s no way anyone could come up with that unless they’d done it before. They never interviewed me; but that’s OK, I had decided I didn’t want to work for them.

  3. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #2982

    • Yeah it turns out that you have to both understand the concept *and* take care to consistently apply that knowledge in practice! is a good source of these as well, and unfortunately the current top post on there is a quadratic algorithm in my code that I knew about when I was at Microsoft and did not ever get around to removing. I’m glad that Neal eventually did!

  4. I’ve read about and sometimes even implemented many very clever optimisations over the years, from insane low level stuff to maximise CPU throughput, to different sorting algorithms, spatial partitioning, all the crazy things you can speed up with Fourier transforms like convolution and the Radon transform blah blah…and *nothing* blows my mind quite like the optimisation I think you’re about to implement.

    I’m really looking forward to seeing where this series goes, as I’ve never really understood the technique.

    • Gosper’s algorithm does blow the mind somewhat, yes. I’m going to approach it slowly and carefully and justify each step, I hope.

      If you have examples of particularly clever optimizations you think I should describe in this series, I am very open to suggestions. My plan is to start with the naive implementation, then do a bunch of explorations of different optimizations:

      * Scholes’ famous one-liner APL algorithm
      * maintain an updated neighbour count rather than recomputing it every time
      * maybe do “compress three cells and their neighbour counts into 16 bit words, use those words as keys to a jump table of actions. This one is a LOT of boring code and I might skip implementing it.
      * either “maintain a recent-changes list, only recompute portions that are changing” or similarly, sparse array, recompute only in the neighbourhood of living cells. Maybe both, though those would be pretty similar articles.

      That’s all I’ve got so far; like I said, I am very open to suggestions.

  5. The two examples for O(1) and O(n) are brilliant and simple. I could explain algorithmic complexity to my mother with those two. That answer would pull a lot of fake Reddit EILI5 gold for “Explain big-O Notation to me. 🙂

    Speaking of Reddit, you should consider doing an Ask Me Anything. I’d be interested in seeing how you respond in a couple of sentences or paragraphs to a large volume of questions rather than the carefully composed blog posts. (Love the blog though … don’t stop writing again.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s