The VSTO startup sequence

Earlier this week I was looking for an old photo, and while browsing I came across a photo I took of my whiteboard in my office at Microsoft in 2004. Or rather, it was two photos; I’ve crudely stitched them together. Click on the image for a larger version.

OMG. What. The. Heck. Is. All. That. Nonsense?

Let me start with a little history.

Before I was on the C# team and after I was on the scripting languages team, I spent a couple years at Microsoft working on the plumbing for Visual Studio Tools for Office.

The idea of VSTO was to bring the ability to write truly rich, client-server aware, data-driven applications in Office documents using C# or Visual Basic; we wanted to go beyond the scripts and productivity tools typical of VBA customizations.

This was a project with many, many difficulties, both technical and political. On the political side of things, I think it is fair to say that the Office team has historically had (with good reason!) a great deal of resistance to taking any compatibility burden that would possibly slow down their ability to innovate in future releases; “platformizing” Word and Excel into true application hosts by a team external to Office was just such a burden.

The technical difficulties were considerable, in large part due to concerns about the security model. We were deeply, painfully aware of how Office customizations and scripting languages had been misused in the past as vectors for malware, and we did not want to create new vulnerabilities. As mitigation, we designed a mechanism that would isolate any customization code to its own appdomain with a highly restrictive default security policy.

Office, however, was not at the time designed to host the CLR. They were only willing to give us a single callback to our loader code that kicked off the whole process when a customized spreadsheet or document was loaded.

By 2004 we were on the fourth revision to our loading algorithm and I was tasked with coming up with the fifth; to facilitate discussion of options I drew a diagram on my whiteboards which I helpfully titled “HIGHLY SIMPLIFIED STARTUP SEQUENCE v4”.

A few things strike me about this diagram now, over 16 years later.

First: though it looks like a mess, I did actually put some thought into the design elements.

  • The diagram is divided into three sections, separated by heavy blue vertical lines. On the left are components running entirely outside of the CLR; in the middle are components that run in the CLR’s default appdomain, and on the right are components that run in the customization’s restricted appdomain. (And of course on the extreme far left is the edge of my “THE MATRIX” poster. A lot of the code names of the different parts of the project were references to The Matrix, including the team cover band that I played keyboards for. I am sad that I can no longer wear my “The Red Pills” polo shirt in public due to the co-opting of that movie reference by misogynist jerks.)
  • The purple boxes that run along the top are components and the lollipops give the interfaces they implement.
  • The purple boxes and arrows below give the exact sequence of twenty different method calls showing what component is calling what other component with what data, and why. In particular the diagram allows us to easily see when a component running in a more restricted security environment is calling into a less restricted environment; those calls need to be allowed because we need them to happen, but that then means that maybe hostile user code could call them, which could be bad.
  • Design problems, questions, annotations and proposed changes are shown in blue.
  • Red is used to identify one key component and an important question about it.
  • I have no idea what that yellow code fragment is or why it was written over top of everything else. It looks irrelevant.

The purpose of the diagram was originally to clarify in my own mind what the sequence was and what the problems were, but by the time it was in this form it was also for giving context to my coworkers when we were discussing options, so it had to be readable. I probably redrew this diagram a half a dozen times before it got to this state.

Second: we can see that there were a good half dozen or more design problems that I was trying to solve here but the big problems involved dirty documents and application manifests.

When you close Word or Excel, you have I am sure noticed that sometimes you get a “save your work?” dialog and sometimes the app just closes. The app is keeping track of whether the document is dirty — changed since it was last loaded or saved — or clean.

Suppose we load a customized spreadsheet, and initializing the customization causes the installer to notice that there is a newer version that it should be downloading. That might change the manifest information about the customization, so the spreadsheet is now “dirty”. But we do not want to ever unnecessarily dirty a document, because that is confusing and irritating to the user.

In step nine the fake application activator obtains an IAppInfo reference from the appdomain manager, updates the manifest from the customization’s server, and parses the manifest. My comments say:

  • Do not write back at this point; need to maintain dirty state
  • No, don’t do this at all. Host must provide updated manifest. This is not a VSTA feature, it is VSTO. (Meaning here that something here is unique to Visual Studio Tools for Office, and not the generalization of it we were working on, VST for Applications.)
  • Must do both. Don’t write back. AIState object must ensure dirtyness.

Apparently I was deeply conflicted on this point. I don’t recall how it was resolved.

My favourite comment though is the one in red:

Can we take manifest out of doc? Peter: “It would be awesome. If assembly is available offline, so is manifest”.

The scenario here had something to do with both the dirty bit problem, and more generally dealing with locally cached customizations. We did a bunch of work on the security model for “what happens if you’re trying to run a customization while your laptop is in airplane mode and we can’t get to the server to check for updates”. Peter is of course legendary Microsoft PM Peter Torr with whom I worked for many years.

My second favourite was where I said “OFFICE12?” Yeah, what’s going to happen when Office revs? Can we guarantee that all this stuff keeps working?

Third: It’s funny how the mind works. Though I’ve described the organization of the diagram and the major problems, today I remember almost none of what is going on here, what the specific issues were, or how we resolved them. But that whole sequence was intensely important to me for several months of my life; it was the foundational plumbing to the entire endeavor and so I knew it literally forwards and backwards. Those memories are 90% gone now. And yet if someone were to yell the numbers “nine six seven eleven eleven” at me from across the street I would be unable to not think “call Pizza Pizza, right away“. Thanks, 1980s jingle writers.

Fourth: I often think about this sort of thing in the context of those “tell me about a time you solved a design problem” interview questions. This “highly simplified” startup sequence with its twenty method calls has to balance:

  • security
  • performance
  • privacy
  • debuggability
  • code maintainability
  • versioning
  • robustness
  • graceful failure
  • user irritation

and numerous other design criteria. But can you imagine trying to explain any detail of this diagram to someone with no prior knowledge in a 45 minute interview? Real-world design problems are hard precisely because there are so many conflicting goals and messy politics. And worse, too often this is the institutional knowledge that is never written down and then lost.

Coming up on FAIC: Not sure!

  • I want to embark upon a more detailed dive into Bean Machine
  • We have just open-sourced a tool we use for benchmarking PPLs internally; I’d like to talk about that a bit
  • I’ve developed a little AST rewriting library in Python that is kinda fun; I could delve into the ideas behind that.

Let me know in the comments what you think.

Life, part 37

All right, let’s finish this thing off and finally answer the question that I’ve been asking for a while: are there any Life patterns that have unbounded quadratic growth? (Code for this final episode is here.)

The answer is yes; there are several different kinds of patterns that exhibit quadratic growth. Today we’ll look at one of them.

We know that guns, puffers and rakes all produce linear growth. But if we had a puffer whose debris was glider guns, that thing would have linear growth of glider guns, each of which has linear growth of gliders, and so that would be quadratic growth! Such a pattern is called a “breeder”.

ASIDE: Quadratic growth is the upper limit; do you see why? Making an argument for why no finite Life pattern can grow the total number of living cells more than quadratically is left as an exercise for the reader.

Let’s see if we can make this happen.

Back in episode 22 I posted a puffer/rake that produces loafs, blocks and two streams of gliders, and said it was a simplified version of “backrake 3”.

There is rather a lot we can do to modify this pattern to change the stuff left in its wake. For example, if we put another pair of spaceships slightly behind the rake, the little “spark” that the spaceship makes interacts with the loaves-and-blocks debris trail but not the gliders. The resulting chaos has the effect of destroying every other block, and all the loaves:

If we put a second pair of spaceships even further behind, they clean up the remaining blocks. This is the pattern “backrake 3”:

What’s more, if we leave out one of those trailing spaceships, that’s fine. We end up destroying the blocks on just one side and the other block remains.

Rounding out our collection of modifications, we can also put a spaceship off to either side such that when a glider collides with it, the glider is destroyed but the spaceship lives:

Summing up: by modifying our original puffer/rake we can produce any combination of:

  • Gliders above, below, or neither
  • Blocks and loaves, pairs of blocks, single blocks, or nothing.

That’s interesting, but how does it get us to a quadratic-growth pattern? Patience! We will get there.

The next thing to consider is: we’ve seen that we can turn blocks-and-loaves into no loaves and half as many blocks. And we’ve seen that we can go from gliders to no gliders on either side. Could we make the glider stream half as dense? For example, could we destroy every other glider? Yes, by combining two of our patterns into a larger one. We’ll combine:

  • Single block, single glider
  • No still Lifes debris, single glider

There is a way to cause a glider to hit a block such that both of them are destroyed, but the single blocks left behind by the puffer are spaced such that every other glider survives! Here the upper puffer is producing its usual spaceships and blocks, but the lower puffer has half of its gliders destroyed:

OK, so we can thin out a stream of gliders; that is another tool in our toolbox.

The next thing to know is that it is possible to create a huge number of interesting Life patterns by running gliders into each other; in fact, finding the “glider synthesis” of a pattern is a major area of focus for Life enthusiasts. For example, if you collide two gliders together like this:

Then six generations later, there will be a “pond” still Life:

If you then hit that pond with a second glider like this:

Then four generations later it becomes a ship still Life:

So we can create ponds with two gliders and ships with three. If we then hit that ship with a glider like this:

Then it turns into the unstable “queen bee” pattern:

But we know from episode 17 that two queen bees stabilized by two blocks makes a glider gun! We have rakes that produce streams of gliders, we have puffers that produce streams of blocks; put it all together and we get:

Breeder 1! Click on the image for a larger picture, load it into the executable for this episode, or take a look at it running on the wiki; it is quite hypnotic to watch all its intricate parts work together. With this pattern we add a new glider gun every 64 ticks, and then each glider gun produces a new glider every 30 ticks. It is obvious just from looking at the triangle of gliders that the population of gliders is growing as the square; it is making half a square.

It should come as no surprise that this pattern was created by Bill Gosper in the 1970s; just as he was the first to build a glider gun, so too was he the first to build a breeder.

What is the performance of our implementation of Hensel’s QuickLife compared to Gosper’s HashLife? Let’s start by thinking about what we should expect with QuickLife.  The time cost of one tick in QuickLife is proportional to the number of active Quad3s, and therefore the cost of computing tick number n should be proportional to n2. That means the total cost of computing ticks one through n should be proportional to n3.

I ran a similar test to the performance test from episode 35: I ran 8 generations, then restarted the engine and ran 16 generations, then restarted and ran 32 generations, and so on, up to 214 generations. This log-log scale chart is ticks on the x axis, milliseconds on the y axis:

Fascinating! We see that in the early part of the growth, we are basically linear; doubling the number of ticks doubles the amount of time it takes to compute. But by the time we are up to computing the first 8192 ticks or more, the total time is now growing as the cube, as we would expect. (Remember, the slope of the line on the log-log graph indicates the exponent of the geometric growth.)

What about Gosper’s HashLife algorithm? Same test: start from the original breeder pattern, step forward some number of ticks. Of course we reset the caches after every test so as to fairly assign work done by a test to that test.

The astonishing result: 76 milliseconds, no matter how many generations we are stepping ahead! Notice that I’m not saying 76 milliseconds per generation; 76 milliseconds, period. Want to see what the pattern does after 214 generations? QuickLife took almost three minutes to compute that; HashLife does it in 76 milliseconds. 224 generations? 234? Same cost! It looks like magic.

What is happening here is of course Gosper’s algorithm is exploiting the extraordinary amount of similarity in both space and time. The spaceship part of the breeder moves itself 32 cells and repeats its internal configuration every 64 ticks which is perfect for HashLife’s quad-based memoizing. The glider guns are all the same, and the half-square of gliders is easily deduplicated no matter how big it gets.

Let’s wrap it up here; I hope you enjoyed that exploration of Life algorithms as much as I did; I have been wanting to write this series for over a decade.

There is enormously more to Life; I feel like I have just scratched the surface in this series. There are many more patterns to explore and open problems to solve. And more generally there is active and interesting work being done by enthusiasts on cellular automata with different rules, different dimensions, different grid connectivity, and so on.

If you’re interested in learning more, I highly recommend two things; first, the latest news in Life discoveries is available on the wiki that I have linked to many times in this series. Second, try playing around with Golly, a much more fully-featured and high-performance engine for exploring cellular automata than the little toy I’ve made for pedagogic purposes in this series. It implements both Gosper’s HashLife and Hensel’s QuickLife, along with variants for exploring other rule sets.

Coming up on FAIC: I’m going to take a short break from blogging for a bit; that last series was rather longer than I anticipated! We will probably then continue looking at Bean Machine, the probabilistic programming language I work on these days.

Life, part 36

This was supposed to be the last episode of this series, but while writing it I discovered a bug in my implementation of Hensel’s QuickLife algorithm. It’s a small defect, easily fixed, but I thought it might be interesting to dig into it. (The bug is in the code from episodes 30 through 35; I’ve fixed it in the episode 35 branch.)

When I’m investigating a defect, there are a lot of stages I go through. Today we’ll look at:

  • Make a minimal reproducer
  • Determine the exact code path that causes the defect
  • Understand what went wrong at the level of the code
  • Fix it
  • Do a root cause analysis
  • Sum up the lessons learned

Of course if this were a product there would be more steps in there, such as code review of the fix, writing regression tests, getting the regressions code reviewed, adding assertions that verify the violated invariant elsewhere, reviewing related code for similar defects, determining if customers are affected by the bug or the fix, and so on. Obviously we won’t go through those here!

It took me some time to find a minimal repro; I won’t go through how I did so because it was tedious trial and error; suffice to say that I found a very complicated reproducer by accident and then made it smaller and smaller until I could go no further.

The minimal repro is very straightforward; we load this pattern into an empty QuickLife instance and step it two ticks. There is only one active Quad4. The even generation of the active Quad4 looks like this:

The first step we are going from even generation zero to odd generation one; recall that in QuickLife, the odd generation is offset by one cell in each direction, so odd generation one should look like this:

That’s a still Life, so if we’ve done things correctly, even generation two should look like:

But we have not done things correctly, and it does not look like this! Instead we observe that generation two is displayed as being the same as generation zero and that moreover, this pattern continues on subsequent even generations. We have a little isolated one-cell blinker, which should be impossible.

(ASIDE: This is the simplest version of this defect; while investigating I also found more complex versions where the “spark” part was in an adjacent Quad4 and the “block” part was not a still Life. Going through the more complex scenarios would make this long post much longer but both the root cause and the fix are the same, so I will skip that part of the analysis.)

What code path leads to this defect?

First, a quick refresher. Recall that QuickLife gets its speed in several ways. The core logic that we are concerned about for this bug is:

  • A Quad4 represents two generations of a 16×16 fragment of the board; one even, one odd.
  • Each of the eight Quad3s in a Quad4 keeps track of four “regions” and whether those regions are active, stable or dead. By “stable” we mean “exactly the same as two ticks previous”, and by “dead” we mean “stable, and every cell is dead”.
  • If a Quad3 in an active Quad4 is stable or dead then we can skip computing its next generation because it will be the same as the previous generation.

The code I wrote to implement this earlier in the series is usually correct, but it is subtly wrong on generations zero and one, which of course can then cause the computation to be wrong on every subsequent generation. Let’s trace through it carefully and see where things go pear shaped.

Initial state:

  • The even half of the Quad4 cell state is as given in the first diagram.
  • The odd half of the Quad4 cell state is all dead.
  • All even and odd regions of all Quad3s are set to “active”.

To step from generation zero to generation one, remember we execute this code:

private void StepEven()
  foreach (Quad4 c in active)
    if (!RemoveStableEvenQuad4(c))
    c.StayActiveNextStep = false;

The even half of the Quad4 is marked active so we do not remove it, and enter the body of the condition:

public void StepEven()

Things are going to go wrong in the NE Quad3, so let’s take a look at that.

private void StepEvenNE()
  if (OddNEPossiblyActive())
    Quad3 newOddNE = Step9Quad2ToQuad3Even(...);
    OddNEState = oddNE.UpdateOddQuad3State(newOddNE, OddNEState);
    oddNE = newOddNE;
    OddNEState = oddNE.MakeOddStableOrDead(OddNEState);

The first thing we do is check if the odd cycle of the NE Quad3 could possibly change from its current state; since the even cycle is marked as active, it could. We enter the consequence of the condition and correctly compute that the odd NE Quad3 is all dead. There is only one isolated living cell there, and that’s not enough to sustain life.

Here’s the first thing that goes terribly wrong. UpdateOddQuad3State compares the new odd NE Quad3 to the previous one and discovers both of them are “all dead”. Remember our definition of the state bits: in order for a Quad3 region to be marked as “dead” it must be (1) all dead, and (2) stable; all dead twice. We therefore mark the odd NE Quad3 as “dead” in all four regions.

It seems like everything is working correctly so far. I won’t go through the rest of the even-to-odd step in detail; summing up:

  • All four Quad3s on the even cycle are still marked as active in all regions
  • The odd SW Quad3 is marked as active overall and on its east edge, because it is different from how it was in the “previous” generation.
  • The odd NE, NW and SE Quad3s are marked as “dead” in all regions.

Now let’s look at what happens on the generation-one-to-generation-two step, just in the northeast Quad3 of this Quad4:

private void StepOddNE()
  if (EvenNEPossiblyActive())
    Quad3 newEvenNE = Step9Quad2ToQuad3Odd(...);
    EvenNEState = evenNE.UpdateEvenQuad3State(newEvenNE, EvenNEState);
    evenNE = newEvenNE;
    EvenNEState = evenNE.MakeEvenStableOrDead(EvenNEState);

Once again, the first thing we check is whether there is any point in recomputing the even state; if we know it is going to be the same this time as last, then we can skip it… and… wait a minute:

private bool EvenNEPossiblyActive() =>
  OddNortheastOrBorderingActive ||
  N != null && N.OddSouthEdge10EastActive;

We just marked the odd NE, NW and SE Quad3s as “stably dead in all regions” so OddNortheastOrBorderingActive is false; there is no Quad4 to the north, but if there were, it would be all dead also. The wrong conclusion we reach is: on the next tick, the northeast corner of this Quad3 must be the same on generation 2 as it was on generation 0 so we can skip computing it.

We therefore enter the alternative (else) branch, call MakeEvenStableOrDead, and mark the even NE Quad3 as “stable”. This is obviously wrong, and worse, that wrongness then persists forever because the whole Quad4 will soon be marked as “stable” and we will have a period-two oscillator between the states illustrated in the first two diagrams above.

The appropriate fix is determined by understanding what has really gone wrong at the level of the code. What invariant did we depend upon that was violated?

If we’re stepping from an even generation to an odd generation, the current generation is stored in the even Quad3s, and the previous generation is stored in the odd Quad3s. But there is an important assumption made by these optimizations: we assume that the current generation was itself computed by stepping the previous generation. The reasoning is:

  • Generation -1 was all dead in the NE Quad3
  • Generation 0 was the result of stepping generation -1 — uh oh
  • Stepping generation 0 gives us generation 1 all dead in the NE Quad3
  • Generations -1 and 1 are the same in the NE Quad3; it is stable
  • Therefore generation 2 is also stable
  • Generation 2 is the same as generation 0 in the NE Quad3 so we can skip computing it and mark it as stable.

The second premise is false, so the conclusion does not follow.

There are a lot of ways to fix this. What we’re going to do here is keep track of one more piece of board-global state: was the current generation actually computed from the previous generation? Or put another way, is the stored cell state of the previous generation correct? The vast majority of the time it will be, but it is not if we have just loaded a pattern in.

If the previous generation is correct then our algorithm is already correct (I hope!) and we do not need to do anything. What should we do if the previous generation is not correct?

The tempting thing to do is to make that flag a parameter to the step functions. Where did things first go wrong? When we marked the odd NE Quad3 as “stably dead”. We could pass a flag in that says “when stepping even to odd, if the previous odd generation is incorrect then do not compare the new odd cells to the previous odd cells; instead mark them all active.” On the next step we would then not skip computing the new even state, and all would be well.

However, there are now eight code paths that we would have to plumb this flag through. An easier, hackier solution — the solution Hensel applied in the original QuickLife source code — is to allow the step to compute the “wrong” stability flags and then fix them later:

private void StepEven()
  foreach (Quad4 c in active)
    if (!RemoveStableEvenQuad4(c))
    c.StayActiveNextStep = false;
    if (!previousCorrect)
  previousCorrect = true;

And similarly on the odd cycle. That’s our fix.

Whenever I fix a defect, I try to spend some time asking myself how this defect came to be in the code in the first place. In this particular case, that’s easy. I remember very clearly the faulty thought process.

As I’ve mentioned before, Hensel’s explanation of the basic ideas underlying the original QuickLife implementation is very clear, but the code that implements it is arcane in many ways. There are loop bodies and conditional bodies with hundreds of lines of dense, bit-twiddling code. The bit manipulation is all raw hex numbers. The variable naming conventions are obscure; p means even, q means odd, for instance. It took me a long time to understand the details of the algorithm, and I wanted to simplify my task by ignoring anything that wasn’t directly related to the actual optimizations.

One of the nice-to-have features of QuickLife (and several other algorithms we’ve looked at) that I did not spend any time in this series addressing is: because we have the previous and current states stored, you could easily add a handy “step backwards one tick” feature to the UI. And in fact the original Java implementation of QuickLife has this feature.

Now, you have to be careful about this, for two reasons. First, obviously once you have stepped backwards once, you cannot do so again. You need to keep track of whether you’ve already stepped back once or not and disallow it. But there is a bigger problem which, given the preceding material in this ever-longer post, you have undoubtedly already deduced. If we were on an even generation, then the current state is in the even state and the previous state is in the odd state. If we step back one tick, then the current state is in the odd state, but the even state is not the previous state; it is the next state. We need to make sure that when we do the odd-to-even step, we do not come to the erroneous conclusion that the even state is all stable!

The original code has a very clearly-named state variable; it has a field backcorrect which indicates whether the previous state is correct or not. If set to false, then the algorithm does exactly what I’ve done in my bug fix: upon stepping, it marks every region of every Quad3 to “active”, and then finally sets the correctness flag to true.

My mistake was that I wrongly believed upon initially reading the code that I could ignore backcorrect because it was only for implementing the step-backwards feature, which I was not going to add to my UI. I completely missed the fact that Hensel sets this flag to false whenever the user forces a change to cell state, and that setting that flag to false is crucial for ensuring the correctness of the algorithm in that scenario.

I feel a keen sense of irony that after figuring out all the obscure and arcane parts of the algorithm, I missed the importance of this state bit that was more clearly named than all the other state bits.

That was the root cause of the defect, but there were other contributory factors:

  • I made the classic high school math mistake of reasoning inductively without establishing a base case. In this blog series I made arguments for the correctness of this algorithm based on the assumption that the previous generation was the real previous generation of the current generation. But that assumption breaks down on the base case of generation zero.
  • My test regimen was inadequate. I have no unit tests or end-to-end tests; all my testing for this entire series has been completely ad-hoc-try-it-out-in-the-UI-and-see-if-it-looks-right.
  • I have very few assertions in the code.
  • When implementing a complex and easily misunderstood algorithm like QuickLife, I probably should have built a “check mode” implementation of the ILife interface that takes two implementations, does all the same operations to both, and verifies that the new board states are the same in both implementations. I could then have written a “random soup” test generator and verified my implementations against each other.

I (re)learned some lessons from this bug. From the specific to the general:

  • When you’re porting code you didn’t write and there is a part you think is unnecessary to port, really dig in and understand whether it can be safely removed.
  • I made good arguments about the correctness of the “steady state”, but I did not establish the correctness of the code on “boundary” conditions such as “it’s the very first step”. Edge cases are not necessarily rare cases; every program starts at an edge case.
  • A lesson I have learned many times in my life as a compiler developer is strongly reinforced by this bug: every time you cache the result of an analysis of a mutable data structure for performance reasons you create an opportunity for a bug should a mutation cause the cached analysis to become incorrect. I’ve fixed so many compiler bugs like that.
  • The whole point of this series is that you can sometimes find specific aspects of the “business domain” of your computation and use those to drive performance optimizations. If those optimizations depend on invariants that can be forced to be violated, the optimization will lead to an incorrect computation!

We took advantage of the rules of Life to get wins; when we force those rules to be violated, computations that depend on those rules become incorrect.

Next time on FAIC: Let’s finish this thing up! Are there patterns that grow quadratically?