The VSTO startup sequence

Earlier this week I was looking for an old photo, and while browsing I came across a photo I took of my whiteboard in my office at Microsoft in 2004. Or rather, it was two photos; I’ve crudely stitched them together. Click on the image for a larger version.

OMG. What. The. Heck. Is. All. That. Nonsense?

Let me start with a little history.

Before I was on the C# team and after I was on the scripting languages team, I spent a couple years at Microsoft working on the plumbing for Visual Studio Tools for Office.

The idea of VSTO was to bring the ability to write truly rich, client-server aware, data-driven applications in Office documents using C# or Visual Basic; we wanted to go beyond the scripts and productivity tools typical of VBA customizations.

This was a project with many, many difficulties, both technical and political. On the political side of things, I think it is fair to say that the Office team has historically had (with good reason!) a great deal of resistance to taking any compatibility burden that would possibly slow down their ability to innovate in future releases; “platformizing” Word and Excel into true application hosts by a team external to Office was just such a burden.

The technical difficulties were considerable, in large part due to concerns about the security model. We were deeply, painfully aware of how Office customizations and scripting languages had been misused in the past as vectors for malware, and we did not want to create new vulnerabilities. As mitigation, we designed a mechanism that would isolate any customization code to its own appdomain with a highly restrictive default security policy.

Office, however, was not at the time designed to host the CLR. They were only willing to give us a single callback to our loader code that kicked off the whole process when a customized spreadsheet or document was loaded.

By 2004 we were on the fourth revision to our loading algorithm and I was tasked with coming up with the fifth; to facilitate discussion of options I drew a diagram on my whiteboards which I helpfully titled “HIGHLY SIMPLIFIED STARTUP SEQUENCE v4”.

A few things strike me about this diagram now, over 16 years later.


First: though it looks like a mess, I did actually put some thought into the design elements.

  • The diagram is divided into three sections, separated by heavy blue vertical lines. On the left are components running entirely outside of the CLR; in the middle are components that run in the CLR’s default appdomain, and on the right are components that run in the customization’s restricted appdomain. (And of course on the extreme far left is the edge of my “THE MATRIX” poster. A lot of the code names of the different parts of the project were references to The Matrix, including the team cover band that I played keyboards for. I am sad that I can no longer wear my “The Red Pills” polo shirt in public due to the co-opting of that movie reference by misogynist jerks.)
  • The purple boxes that run along the top are components and the lollipops give the interfaces they implement.
  • The purple boxes and arrows below give the exact sequence of twenty different method calls showing what component is calling what other component with what data, and why. In particular the diagram allows us to easily see when a component running in a more restricted security environment is calling into a less restricted environment; those calls need to be allowed because we need them to happen, but that then means that maybe hostile user code could call them, which could be bad.
  • Design problems, questions, annotations and proposed changes are shown in blue.
  • Red is used to identify one key component and an important question about it.
  • I have no idea what that yellow code fragment is or why it was written over top of everything else. It looks irrelevant.

The purpose of the diagram was originally to clarify in my own mind what the sequence was and what the problems were, but by the time it was in this form it was also for giving context to my coworkers when we were discussing options, so it had to be readable. I probably redrew this diagram a half a dozen times before it got to this state.


Second: we can see that there were a good half dozen or more design problems that I was trying to solve here but the big problems involved dirty documents and application manifests.

When you close Word or Excel, you have I am sure noticed that sometimes you get a “save your work?” dialog and sometimes the app just closes. The app is keeping track of whether the document is dirty — changed since it was last loaded or saved — or clean.

Suppose we load a customized spreadsheet, and initializing the customization causes the installer to notice that there is a newer version that it should be downloading. That might change the manifest information about the customization, so the spreadsheet is now “dirty”. But we do not want to ever unnecessarily dirty a document, because that is confusing and irritating to the user.

In step nine the fake application activator obtains an IAppInfo reference from the appdomain manager, updates the manifest from the customization’s server, and parses the manifest. My comments say:

  • Do not write back at this point; need to maintain dirty state
  • No, don’t do this at all. Host must provide updated manifest. This is not a VSTA feature, it is VSTO. (Meaning here that something here is unique to Visual Studio Tools for Office, and not the generalization of it we were working on, VST for Applications.)
  • Must do both. Don’t write back. AIState object must ensure dirtyness.

Apparently I was deeply conflicted on this point. I don’t recall how it was resolved.

My favourite comment though is the one in red:

Can we take manifest out of doc? Peter: “It would be awesome. If assembly is available offline, so is manifest”.

The scenario here had something to do with both the dirty bit problem, and more generally dealing with locally cached customizations. We did a bunch of work on the security model for “what happens if you’re trying to run a customization while your laptop is in airplane mode and we can’t get to the server to check for updates”. Peter is of course legendary Microsoft PM Peter Torr with whom I worked for many years.

My second favourite was where I said “OFFICE12?” Yeah, what’s going to happen when Office revs? Can we guarantee that all this stuff keeps working?


Third: It’s funny how the mind works. Though I’ve described the organization of the diagram and the major problems, today I remember almost none of what is going on here, what the specific issues were, or how we resolved them. But that whole sequence was intensely important to me for several months of my life; it was the foundational plumbing to the entire endeavor and so I knew it literally forwards and backwards. Those memories are 90% gone now. And yet if someone were to yell the numbers “nine six seven eleven eleven” at me from across the street I would be unable to not think “call Pizza Pizza, right away“. Thanks, 1980s jingle writers.


Fourth: I often think about this sort of thing in the context of those “tell me about a time you solved a design problem” interview questions. This “highly simplified” startup sequence with its twenty method calls has to balance:

  • security
  • performance
  • privacy
  • debuggability
  • code maintainability
  • versioning
  • robustness
  • graceful failure
  • user irritation

and numerous other design criteria. But can you imagine trying to explain any detail of this diagram to someone with no prior knowledge in a 45 minute interview? Real-world design problems are hard precisely because there are so many conflicting goals and messy politics. And worse, too often this is the institutional knowledge that is never written down and then lost.


Coming up on FAIC: Not sure!

  • I want to embark upon a more detailed dive into Bean Machine
  • We have just open-sourced a tool we use for benchmarking PPLs internally; I’d like to talk about that a bit
  • I’ve developed a little AST rewriting library in Python that is kinda fun; I could delve into the ideas behind that.

Let me know in the comments what you think.

Introducing Bean Machine

The final part of my Life series is still in the works but I need to interrupt that series with some exciting news. The new programming language I have been working on for the last year or so has just been announced by the publication of our paper Bean Machine: A Declarative Probabilistic Programming Language For Efficient Programmable Inference


Before I get into the details, a few notes on attributing credit where it is due and the like:

  • Though my name appears on the paper as a courtesy, I did not write this paper. Thanks and congratulations in particular to Naz Tehrani and Nim Arora who did a huge amount of work getting this paper together.
  • The actual piece of the language infrastructure that I work on every day is a research project involving extraction, type analysis and optimization of the Bayesian network underlying a Bean Machine program. We have not yet announced the details of that project, but I hope to be able to discuss it here soon.
  • Right now we’ve only got the paper; more information about the language and how to take it out for a spin yourself will come later. It will ship when its ready, and that’s all the scheduling information I’ve got.
  • The name of the language comes from a physical device for visualizing probability distributions because that’s what it does.


I will likely do a whole series on Bean Machine later on this autumn, but for today let me just give you the brief overview should you not want to go through the paper. As the paper’s title says, Bean Machine is a Probabilistic Programming Language (PPL).

For a detailed introduction to PPLs you should read my “Fixing Random” series, where I show how we could greatly improve support for analysis of randomness in .NET by both adding types to the base class library and by adding language features to a language like C#.

If you don’t want to read that 40+ post introduction, here’s the TLDR.

We are all used to two basic kinds of programming: produce an effect and compute a result. The important thing to understand is that Bean Machine is firmly in the “compute a result” camp. In our PPL the goal of the programmer is to declaratively describe a model of how the world works, then input some observations of the real world in the context of the model, and have the program produce posterior distributions of what the real world is probably like, given those observations. It is a language for writing statistical model simulations.

A “hello world” example will probably help. Let’s revisit a scenario I first discussed in part 30 of Fixing Random: flipping a coin that comes from an unfair mint. That is, when you flip a coin from this mint, you do not necessarily have a 50-50 chance of getting heads vs tails. However, we do know that when we mint a coin, the distribution of fairness looks like this:

Fairness is along the x axis; 0.0 means “always tails”, 1.0 means “always heads”. The probability of getting a coin of a particular fairness is proportional to the area under the graph. In the graph above I highlighted the area between 0.6 and 0.8; the blue area is about 25% of the total area under the curve, so we have a 25% chance that a coin will be between 0.6 and 0.8 fair.

Similarly, the area between 0.4 and 0.6 is about 30% of the total area, so we have a 30% chance of getting a coin whose fairness is between 0.4 and 0.6. You see how this goes I’m sure.

Suppose we mint a coin; we do not know its true fairness, just the distribution of fairness above. We flip the coin 100 times, and we get 72 heads, 28 tails. What is the most probable fairness of the coin?

Well, obviously the most probable fairness of a coin that comes up heads 72 times out of 100 is 0.72, right?

Well, no, not necessarily right. Why? Because the prior probability that we got a coin that is between 0.0 and 0.6 is rather a lot higher than the prior probability that we got a coin between 0.6 and 1.0. It is possible by sheer luck to get 72 heads out of 100 with a coin between 0.0 and 0.6 fairness, and those coins are more likely overall.


Aside: If that is not clear, try thinking about an easier problem that I discussed in my earlier series. You have 999 fair coins and one double-headed coin. You pick a coin at random, flip it ten times and get ten heads in a row. What is the most likely fairness, 0.5 or 1.0? Put another way: what is the probability that you got the double-headed coin? Obviously it is not 0.1%, the prior, but nor is it 100%; you could have gotten ten heads in a row just by luck with a fair coin. What is the true posterior probability of having chosen the double-headed coin given these observations?


What we have to do here is balance between two competing facts. First, the fact that we’ve observed some coin flips that are most consistent with 0.72 fairness, and second, the fact that the coin could easily have a smaller (or larger!) fairness and we just got 72 heads by luck. The math to do that balancing act to work out the true distribution of possible fairness is by no means obvious.

What we want to do is use a PPL like Bean Machine to answer this question for us, so let’s build a model!

The code will probably look very familiar, and that’s because Bean Machine is a declarative language based on Python; all Bean Machine programs are also legal Python programs. We begin by saying what our “random variables” are.


Aside: Statisticians use “variable” in a way very different than computer programmers, so do not be fooled here by your intuition. By “random variable” we mean that we have a distribution of possible random values; a representation of any single one of those values drawn from a distribution is a “random variable”. 


To represent random variables we declare a function that returns a pytorch distribution object for the distribution from which the random variable has been drawn. The curve above is represented by the function beta(2, 2), and we have a constructor for an object that represents that distribution in the pytorch library that we’re using, so:

@random_variable
def coin():
  return Beta(2.0, 2.0)

Easy as that. Every usage in the program of coin() is logically a single random variable; that random variable is a coin fairness that was generated by sampling it from the beta(2, 2) distribution graphed above.


Aside: The code might seem a little weird, but remember we do these sorts of shenanigans all the time in C#. In C# we might have a method that looks like it returns an int, but the return type is Task<int>; we might have a method that yield returns a double, but the return type is IEnumerable<double>. This is very similar; the method looks like it is returning a distribution of fairnesses, but logically we treat it like a specific fairness drawn from that distribution.


What do we then do? We flip a coin 100 times. We therefore need a random variable for each of those coin flips:

@random_variable
def flip(i):
  return Bernoulli(coin())

Let’s break that down. Each call flip(0), flip(1), and so on on, are distinct random variables; they are outcomes of a Bernoulli process — the “flip a coin” process — where the fairness of the coin is given by the single random variable coin(). But every call to flip(0) is logically the same specific coin flip, no matter how many times it appears in the program.

For the purposes of this exercise I generated a coin and simulated 100 coin tosses to simulate our observations of the real world. I got 72 heads. Because I can peek behind the curtain for the purposes of this test, I can tell you that the coin’s true fairness was 0.75, but of course in a real-world scenario we would not know that. (And of course it is perfectly plausible to get 72 heads on 100 coin flips with a 0.75 fair coin.)

We need to say what our observations are.  The Bernoulli distribution in pytorch produces a 1.0 tensor for “heads” and a 0.0 tensor for “tails”. Our observations are represented as a dictionary mapping from random variables to observed values.

heads = tensor(1.0)
tails = tensor(0.0)
observations = {
  flip(0) : heads,
  flip(1) : tails,
  ...  and so on, 100 times with 72 heads, 28 tails.
}

Finally, we have to tell Bean Machine what to infer. We want to know the posterior probability of fairness of the coin, so we make a list of the random variables we care to infer posteriors on; there is only one in this case.

inferences = [ coin() ]
posteriors = infer(observations, inferences)
fairness = posteriors[coin()]

and we get an object representing samples from the posterior fairness of the coin given these observations. (I’ve simplified the call site to the inference method slightly here for clarity; it takes more arguments to control the details of the inference process.)

The “fairness” object that is handed back is the result of efficiently simulating the possible worlds that get you to the observed heads and tails; we then have methods that allow you to graph the results of those simulations using standard graphing packages:

The orange marker is our original guess of observed fairness: 0.72. The red marker is the actual fairness of the coin used to generate the observations, 0.75. The blue histogram shows the results of 1000 simulations; the vast majority of simulations that produced those 72 heads had a fairness between 0.6 and 0.8, even though only 25% of the coins produced by the mint are in that range.  As we would hope, both the orange and red markers are near the peak of the histogram.

So yes, 0.72 is close to the most likely fairness, but we also see here that a great many other fairnesses are possible, and moreover, we clearly see how likely they are compared to 0.72. For example, 0.65 is also pretty likely, and it is much more likely than, say, 0.85. This should make sense, since the prior distribution was that fairnesses closer to 0.5 are more likely than those farther away; there’s more “bulk” to the histogram to the left than the right: that is the influence of the prior on the posterior!

Of course because we only did 1000 simulations there is some noise; if we did more simulations we would get a smoother result and a clear, single peak. But this is a pretty good estimate for a Python program with six lines of model code that only takes a few seconds to run.


Why do we care about coin flips? Obviously we don’t care about solving coin flip problems for their own sake. Rather, there are a huge number of real-world problems that can be modeled as coin flips where the “mint” produces unfair coins and we know the distribution of coins that come from that mint:

  • A factory produces routers that have some “reliability”; each packet that passes through each router in a network “flips a coin” with that reliability; heads, the packet gets delivered correctly, tails it does not. Given some observations from a real data center, which is the router that is most likely to be the broken one? I described this model in my Fixing Random series.
  • A human reviewer classifies photos as either “a funny cat picture” or “not a funny cat picture”. We have a source of photos — our “mint” — that produces pictures with some probability of them being a funny cat photo, and we have human reviewers each with some individual probability of making a mistake in classification. Given a photo and ten classifications from ten reviewers, what is the probability that it is a funny cat photo? Again, each of these actions can be modeled as a coin flip.
  • A new user is either a real person or a hostile robot, with some probability. The new user sends a friend request to you; you either accept it or reject it based on your personal likelihood of accepting friend requests. Each one of these actions can be modeled as a coin flip; given some observations of all those “flips”, what is the posterior probability that the account is a hostile robot?

And so on; there are a huge number of real-world problems we can solve just with modeling coin flips, and Bean Machine does a lot more than just coin flip models!


I know that was rather a lot to absorb, but it is not every day you get a whole new programming language to explain! In future episodes I’ll talk more about how Bean Machine works behind the scenes, how we traded off between declarative and imperative style, and that sort of thing. It’s been a fascinating journey so far and I can’t hardly wait to share it.

 

Installing windows

Episode 34 will be delayed again — sorry! — because once again the time I had set aside for writing this weekend got consumed by a real-world task that could not wait. (I will try for Thursday of this week.)

Some friends who are moving had a handyman failure; as is often the case when renovating a house to be sold, they have a set of build dependencies that required this window to be replaced in a hurry in order to not slip the schedule for other renovations, so I volunteered to take care of it.

Yuck.

Living in a 112 year old house myself, I am used to doing archaeological investigations of the strange decisions made by previous owners. This window, though obviously an old single-paned window, did not look like it was original to the 120-year-old house. It was the wrong size for the rough opening; the hinges looked more modern than turn-of-the-century, and so on.

Sure enough, when disassembled there was a gap behind the trim that was insulated with crumpled newspapers from 1957. Oddly enough they were Pittsburgh newspapers from different days; perhaps the owners of the house in 1957 moved from Pittsburgh, replaced a window, and insulated the gaps with the packing paper they moved with? It’s a mystery.

Having zero haircuts since quarantine began has done wonders for my hair.

New window in and trimmed — obviously the paint will need to be redone but that’s why the window had to go in before the painters arrived.

And the interior needs a little more drywalling and priming before it is ready for painting, but it is 1000000x better than before at least.

The neighbours in the blue house apparently asked my friends for my contact information as they also have a window that needs replacing. I am quite chuffed. I had my friends pass along that I only do windows as a favour, but I would be happy to design them a programming language for hire should they need one of those.

Next time: Gosper’s algorithm, finally!

 

Implementing a full fence

Episode 34 of my ongoing series will be slightly delayed because I spent the time on the weekend I normally spend writing instead rebuilding one of my backyard fences.

I forgot to take a before picture, but believe me, it was ruinous when I bought the place in 1997 and to the point in 2020 where it was actually falling to pieces in a stiff breeze.

I replaced it with an identical design and materials:

…and divided and re-homed some sixteen dozen irises in the process.

It’s nice implementing something that requires no typing every now and then.

 

Approximate results may vary

Part 33 of my ongoing series is coming but I did not get all the code written that I wanted to this week, so it will be delayed. In the meanwhile:


Living in Canada as a child, of course I grew up learning the metric system (with some familiarity with the imperial and US systems of course). If you want to know how many milliliters of liquid a cubic box holds, you just compute the volume of the box in cubic centimeters and you’re done, because a milliliter is by definition the same volume as a cubic centimeter.

Despite having lived in Seattle for over 25 years, I still sometimes do not have an intuitive sense of conversions of US units; for a particular project I knew that I needed an amount of liquid equal to about the volume of a two-inch cube, but the bottle was measured in fluid ounces. Fortunately we have this operation in every browser:

Thanks Bing for those important eight digits after the decimal place there, manufactured from the zero digits after the decimal place of the input. For context, that extra 0043 on the end represents a volume equivalent to roughly the size of a dozen specs of microscopic dust.

But the punchline that made me LOL was “for an approximate result, divide the volume by 1.8046875”, because of course when I am quickly approximating the volume of eight cubic inches in fluid ounces, the natural operation that immediately comes to mind is divide by 1.8046875.

I have some questions.

That was Bing; what does Google do with the same query?

OK, that’s an improvement in that the amount of precision is still unnecessary, but not outright absurd. But the other problems are all the same.

Why division? Maybe it is just me, but I find division by uneven quantities significantly harder mental arithmetic than multiplication; assuming we want the over-precision, would it not be better to say “for an approximate result, multiply the volume by 0.5541125″? 

Why say “to approximate…” and then give an absurd amount of precision in the conversion factor? Approximation is by definition about deliberately not computing an over-precise value.

Surely the right way to say this is “for an approximate result, multiply the volume by 5/9” or even better, “divide by two“. When I saw “divide by 1.8046875” the first thing I thought after “wow that’s so over-precise” was “1.8 is 18/10 is 9/5, so multiply by 5/9“.

I’m going to get there eventually; software can shorten that journey. And I’m going to remember that 5/9ths of a cubic inch is a fluid ounce much more easily than I remember to divide by 1.8046875.

Once you start to see this pattern of over-precision in conversions, it’s like the FedEx arrow: you can’t stop seeing it. Let’s ask the browser how much does an American robin weigh?


64.8 grams. An underweight robin is not 64.5 grams, and not 65.0 grams but 64.8 grams.

To be fair, it looks like this over-precision was the fault of a human author (and their editor) not thinking clearly about how to communicate the fact at hand, rather than bad software this time; if you’re converting “two and a third ounces” to grams it would be perfectly reasonable to round to 65, or even 60. (That third of an ounce is already suspect; surely “two to three ounces” is just fine.) Most odd though is that the computations are not even correct! 2 and 1/3rd ounces is 66.15 grams, and 3 ounces is 85.05 grams, making it rather mysterious where the extra few hundred milligrams went.

I was wondering how many earthworms a robin would have to eat to make up a discrepancy of 0.2 grams. A largish earthworm has got to weigh on the order of a gram, right?

Wow! (For my metric readers out there: 1.5 pounds is 680.388555 grams according to Bing.)

Again: what the heck, Bing? I did not ask for “world’s largest earthworm” or “unusually large earthworms” or even “Australian earthworms”. You know where I live, Bing. (And Google search does no better.)

For some reason I am reminded of Janelle Shane’s “AI Weirdness” tweets; the ones about animal facts are particularly entertaining. These earthworm facts at least have the benefit of being both interesting and correct, but this is hardly the useful result about normal garden-variety annelids that I wanted.

I am also reminded of my favourite animal fact: the hippopotamus can jump higher than a house. It sounds impressive until you remember that houses can’t jump at all.

Obviously all these issues are silly and unimportant, which is why I chose them for my fun-for-Friday blog. And the fact that I can type “8 cubic inches in oz” into my browser or say “how much does a robin weigh?” into a smart speaker and instantly get the answer is already a user interface triumph; I don’t want to minimize that great work. But there is still work to do! Unwarranted extra precision is certainly not the worst sort of fallacious reasoning we see on the internet, but it is one of the most easily mitigated by human-focused software design. I’d love to see improvements to these search functions that show even more attention to what the human user really needs.

Socially distant abbreviated summer vacation

Normally this time of year I would be visiting friends and family in Canada, but obviously that’s impossible right now. Instead we took a long weekend at a rental on Bainbridge Island and strolled around some parks in a socially distant manner instead.

The rental had a balcony at tree level, so at least I got to indulge in my vacation pastime of taking pictures of birds. There were a lot of common feeder birds in the neighbourhood — house finches, house sparrows, a few different species of chickadees, that sort of thing — and I got a few shots in. This little female Rufous hummingbirb posed for glamour shots in a birch tree all day long:

(Click on any picture for a larger image)

She was vexed that a female Anna’s hummingbird kept horning in on her territory, hence her ticked-off expression there.

This super fuzzy juvenile black-capped chickadee also spent days posing:

despite being harassed all day long by the noisy gang of teenager Stellar jays on the rooftop next door.

And of course once again I was completely incapable of taking a non-blurry picture of a belted kingfisher despite many attempts.

Sigh. Better luck next time.

Comet NEOWISE

I went out to Shilshole Bay Marina Tuesday night to get a few photos of the comet; it is quite spectacular! If you’re going stargazing this week, bring binoculars, look to the northwest about an hour after sunset, below the Big Dipper.

And don’t forget to turn around and look the other way because Jupiter and Saturn are in opposition and very bright, just left of Sagittarius.

Click on photos for a higher resolution version.

Report thy feat unto Lord British

Welcome to yet another working-from-home pandemic episode of Fun For Friday Fabulous Adventures. Over the past while I’ve gradually been looking for music, movies and games I enjoyed as a teenager and seeing how they hold up. So I am pleased to announce that it took me only 35 years to finally complete a game I started on my Commodore 64 in 1985:

U4box.jpg

If you are unfamiliar with Ultima IV: it was an incredibly influential game that moved the state of the art forwards in “swords and sorcery” style games. The world was, for its time, enormously detailed, full of characters you could interact with who gave you clues as to the various puzzles you had to solve to win the game. Those puzzles largely took the form of “fetch quests” — you need eight magic stones hidden in six dungeons, and the mantras of eight shrines and the Bell of Courage, and so on; there’s quests built upon quests upon quests.

But the most interesting thing about Ultima IV was that it was the first game I played, and maybe the first game ever, to not just build in a moral system, but to make it the object of the game. The central design element is to not just reward, but require the player to be just, honest, compassionate, honorable and humble within the context of the game, and the quest is ultimately for wisdom. It was an astonishing achievement in a world where the object of all previous fantasy games was “kill the big bad guy”.

Other aspects of the game design are interesting by contrasting them to game design today. The game assumes, for example, that you have read the detailed (and well-produced) manuals that came in the box. It assumes that you’re reading the (cloth!) map and deciphering the runic lettering on it. This would not typically fly today; we expect games to introduce players gradually, to have a tutorial mode built into the game itself, and to not rely on anything exogenous to the software.

It’s a hard game to win without help. I remembered a great many details from my teenage years, and I knew that I could just look up all the mantras and locations of secret items and whatnot on the internet, but I decided to play it more or less like I did not have any of that information ahead of time. I did read some walkthroughs of the harder dungeon levels, because figuring out where all the secret door triggers are is tedious.

Ultima IV is available for free from GOG, and the rest of the series is cheap, so I encourage you to give it a shot if you want some of the retro gaming experience that informed modern fantasy game tropes.


Spoiler ahead:

If you do, there’s a bit of a cheat that I wish I’d remembered: once you are an Avatar and have the Mystic Robes, you can sell them, use the proceeds to buy magic wands and bows, and then find the robes again in the same place you found it the first time. That takes out a lot of the tedium of trying to get enough gold to afford magic ranged weapons. And don’t waste your money on magic armor; just get the Mystic Robes sooner rather than later!


The game ends with an exhortation to contact game author Lord British, also known as Richard Garriott, who amazingly is still making fantasy games that come with cloth maps in the box. I’ll have to order a copy and see how the series has evolved in the last 35 years.

Back in the 1980s if you did contact Lord British after winning an Ultima game, he would send you an autographed certificate of completion, but sadly I suspect I will not be getting one.

2020-04-06 (4).png

I also had Ultima V for my Amiga but I never got very far in it. Perhaps I’ll give it a shot next.

 

New grad vs senior dev

A student who I used to tutor in CS occasionally sent me a meme yesterday which showed “NEW GRAD vs SENIOR DEVELOPER”; the new grad is all caps yelling

NO! YOU CAN’T JUST USE BRUTE FORCE HERE! WE NEED TO USE SEGMENT TREES TO GET UPDATE TIME COMPLEXITY DOWN TO O(LOG N)! BREAK THE DATA INTO CHUNKS AT LEAST! OH THE INEFFICIENCY!!!

and the senior developer responds

Ha ha, nested for loops go brrrrrrrrrrrr…

OK, that’s silly and juvenile, but… oh no, I feel a flashback coming on.

It is 1994 and I am a second-year CS student at my first internship at Microsoft on the Visual Basic compiler team, reading the source code for InStr for the first time. InStr is the function in Visual Basic that takes two strings, call them source and query, and tells you the index at which query first appears as a substring of source, and the implementation is naive-brute-force.

I am shocked to learn this! Shocked, I tell you!

Let me digress slightly here and say what the naive brute force algorithm is for this problem.


Aside: To keep it simple we’ll ignore all the difficulties inherent in this problem entailed by the fact that VB was the first Microsoft product where one version worked everywhere in the world on every version of Windows no matter how Windows was localized; systems that used Chinese DBCS character encodings ran the same VB binary as systems that used European code pages, and we had to support all these encodings plus Unicode UTF-16. As you might imagine, the string code was a bit of a mess. (And cleaning it up in VBScript was one of my first jobs as an FTE in 1996!)

Today for simplicity we’ll just assume we have a flat, zero-terminated array of chars, one character per char as was originally intended.


The extremely naive algorithm for finding a string in another goes something like this pseudo-C algorithm:

bool starts(char *source, char *query)
{
  int i = 0;
  while (query[i] != '\0')
  {
    if (source[i] != query[i])
      return false;
    i = i + 1;
  }
  return true;
}
int find(char *source, char *query)
{
  int i = 0;
  while (source[i] != '\0')
  {
    if (starts(source + i, query))
      return i;
    i = i + 1;
  }
  return -1;  
}

The attentive reader will note that this is the aforementioned nested for loop; I’ve just extracted the nested loop into its own helper method. The extremely attentive reader will have already noticed that I wrote a few bugs into the algorithm above; what are they?

Of course there are many nano-optimizations one can perform on this algorithm if you know a few C tips and tricks; again, we’ll ignore those. It’s the algorithmic complexity I’m interested in here.

The action of the algorithm is straightforward. If we want to know if query “banana” is inside source “apple banana orange” then we ask:

  • does “apple banana orange” start with “banana”? No.
  • does “pple banana orange” start with “banana”? No.
  • does “ple banana orange” start with “banana”? No.
  • does “banana orange” start with “banana”? Yes! We’re done.

It might not be clear why the naive algorithm is bad. The key is to think about what the worst case is. The worst case would have to be one where there is no match, because that means we have to check the most possible substrings. Of the no-match cases, what are the worst ones? The ones where starts does the most work to return false.  For example, suppose source is “aaaaaaaaaaaaaaaaaaaa” — twenty characters — and query is “aaaab”. What does the naive algorithm do?

  • Does “aaaaaaaaaaaaaaaaaaaa” start with “aaaab”? No, but it takes five comparisons to determine that.
  • Does “aaaaaaaaaaaaaaaaaaa” start with “aaaab”? No, but it takes five comparisons to determine that.
  • … and so on.

In the majority of attempts it takes us the maximum number of comparisons to determine that the source substring does not start with the query. The naive algorithm’s worst case is O(n*m) where n is the length of source and m is the length of the query.

There are a lot of obvious ways to make minor improvements to the extremely naive version above, and in fact the implementation in VB was slightly better. The implementation in VB was basically this:

char* skipto(char *source, char c)
{
  char *result = source;
  while (*result != '\0' && *result != c)
    result = result + 1;
  return result;
}
int find(char *source, char *query)
{
  char *current = skipto(source, query[0]);
  while (*current != '\0;)
  {
    if (starts(current, query))
      return current - source;
    current = skipto(current + 1, query[0]);
  }
  return -1;
}

(WOW, EVEN MORE BUGS! Can you spot them? It’s maybe easier this time.)

This is more complicated but not actually better algorithmically; all we’ve done is moved the initial check in starts that checks for equality of the first letters into its own helper method. In fact, what the heck, this code looks worse. It does more work and is more complicated. What’s going on here? We’ll come back to this.

As I said, I was a second year CS student and (no surprise) a bit of a keener; I had read ahead and knew that there were string finding algorithms that are considerably better than O(n*m). The basic strategy of these better algorithms is to do some preprocessing of the strings to look for interesting features that allow you to “skip over” regions of the source string that you know cannot possibly contain the query string.

This is a heavily-studied problem because, first off, obviously it is a “foundational” problem; finding substrings is useful in many other algorithms, and second, because we genuinely do have extremely difficult problems to solve in this space. “Find this DNA fragment inside this genome”, for example, involves strings that may be billions of characters long with lots of partial matches.

I’m not going to go into the various different algorithms that are available to solve this problem and their many pros and cons; you can read about them on Wikipedia if you’re interested.

Anyways, where was I, oh yes, CS student summer intern vs Senior Developer.

I read this code and was outraged that it was not the most asymptotically efficient possible code, so I got a meeting with Tim Paterson, who had written much of the string library and had the office next to me.

Let me repeat that for those youngsters in the audience here, TIM FREAKIN’ PATERSON. Tim “QDOS” Paterson, who one fine day wrote an operating system, sold it to BillG, and that became MS-DOS, the most popular operating system in the world. As I’ve mentioned before, Tim was very intimidating to young me and did not always suffer foolish questions gladly, but it turned out that in this case he was very patient with all-caps THIS IS INEFFICIENT Eric. More patient than I likely deserved.

As Tim explained to me, first off, the reason why VB does this seemingly bizarre “find the first character match, then check if query is a prefix of source” logic is because the skipto method is not written in the naive fashion that I showed here. The skipto method is a single x86 machine instruction. (REPNE SCASB, maybe? My x86 machine code knowledge was never very good. It was something in the REP family at least.) It is blazingly fast. It harnesses the power of purpose-built hardware to solve the problem of “where’s that first character at?”

That explains that; it genuinely is a big perf win to let the hardware do the heavy lifting here. But what about the asymptotic problem? Well, as Tim patiently explained to me, guess what? Most VB developers are NOT asking if “aaaab” can be found in “aaaaaaa…”. The vast majority of VB developers are asking is “London” anywhere in this address, or similar problems where the strings are normal human-language strings without a lot of repetitions, and both the source and query strings are short.  Like, very short. Less than 100 characters short. Fits into a cache line short.

Think about it this way; most source strings that VB developers are searching have any given character in them maybe 2% of the time, and so for whatever the start character is of the query string, the skipto step is going to find those 2% of possible matches very quickly. And then the starts step is the vast majority of the time going to very quickly identify false matches. In practice the naive brute force algorithm is almost always O(n + m). 

Moreover, Tim explained to me, any solution that involves allocating a table, preprocessing strings, and so on, is going to take longer to do all that stuff than the blazingly-fast-99.9999%-of-the-time brute force algorithm takes to just give you the answer. The additional complexity is simply not worth it in scenarios that are relevant to VB developers. VB developers are developing line-of-business solutions, and their line of business is not typically genomics; if it is, they have special-purpose libraries for those problems; they’re not using InStr.

And we’re back in 2020. I hope you enjoyed that trip down memory lane.

It turns out that yes, fresh grads and keener interns do complain to senior developers about asymptotic efficiency, and senior developers do say “but nested for loops go brrrrrrr” — yes, they go brrrrrr extremely quickly much of the time, and senior developers know that!

And now I am the senior developer, and I try to be patient with the fresh grads as my mentors were patient with me.


UPDATE: Welcome, Hacker News readers. I always know when I’m linked from Hacker News because of the huge but short-lived spike in traffic. The original author of the meme that inspired this post has weighed in. Thanks for inspiring this trip back to a simpler time!

Bandits, victims and idiots

I don’t enjoy politics, I don’t know enough about it, and my privilege greatly insulates me from its negative effects, and so I don’t talk about it much on this blog. My intention in creating the blog lo these decades ago was to make a friendly, human, competent public face to my team at Microsoft, and to get information out about languages that was not in the official documentation; it was not ever intended to be a soapbox. Only a couple times have I commented on political situations, and today will be the third apparently.

I have been thinking many times these last four years, and much more these last few days about the late Italian economic historian Carlo Cipolla. Not because of his economic theories, of which I know very little, but rather because of his theory of stupidity. You can read the principles in brief for yourself at the link above, or the original paper here, but I can summarize thus:

  • Powerful smart people take actions that benefit both themselves and others.
  • Victims lack the power to protect themselves. They are unable to find actions that benefit themselves, and are victimized to the benefit of others.
  • Bandits take actions that benefit themselves at the expense of victims.
  • Stupid idiots take actions that benefit neither themselves nor others.

These are value-laden terms so let’s be clear here that neither I nor Cipolla are suggesting that victims, bandits or idiots are not intelligent:

  • No matter how intelligent you are and how many precautions you take, you can be victimized by a bandit or an idiot. Victims are not to blame for their victimization. We’ll come back to this in a moment.
  • Bandits are often very intelligent; they just use their skills to victimize others. Whether that’s because they are genuinely not intelligent enough to make a living helping others, or because they are that intelligent but psychologically enjoy being a bandit, or are bandits for other reasons, it doesn’t matter for our purposes. Assume that bandits are extremely intelligent and devious, but motivated by gain.
  • Idiots, ironically, are often very intelligent; a great many idiots have fancy degrees from excellent colleges. As Cipolla points out in his paper, there is no characteristic that identifies idiots other than their inability to act in a way that benefits anyone including themselves. That includes intelligence or lack thereof.

Some key consequences of this model have been on my mind these last few days:

  • Bandits, even the psychopaths, are motivated by self-interest and recognize actions that benefit themselves. You can reason with a bandit, but more importantly, you can reason about a bandit, and therefore you can make use of a bandit. You can make an offer to a powerful bandit and count on them to take it up if it maximizes their gain.
  • You cannot reason with an idiot. You can’t negotiate with them to anyone’s advantage because they will take positions that harm themselves at the same time as they harm others. There are no “useful idiots”; any attempt to use an idiot to benefit yourself will backfire horribly as they manage to find a way for everyone to lose.
  • When the idiots are in power, there is no bright line separating the smart from the victims; rather, there is just a spectrum of more or less power and privilege. Victims by definition lack the power to defend themselves, and the more privileged have no lever to pull to change the course of the idiot, who will act with such brazen disregard for the well-being of everyone including themself that it is hard to devise a strategy.

All this is by way of introduction to say: the position that I am seeing on Twitter and in the media that “soon” is a good time to “re-start the economy” is without question the stupidest, most idiotic position I have ever heard of in my life and that includes “let’s invade Afghanistan for no strategic purpose with no plan on how to ever leave”. There is no way that ends well for anyone, and that includes the billionaires who are temporarily inconvenienced by a slight dip in the flow of cash into their coffers.

I’ll leave you with how Cipolla finishes his essay, because it sums up exactly how I feel at this moment in history.

In a country which is moving downhill […] one notices among those in power an alarming proliferation of the bandits with overtones of stupidity and among those not in power an equally alarming growth in the number of helpless individuals. Such change in the composition of the non-stupid population inevitably strengthens the destructive power of the stupid fraction and makes decline a certainty. And the country goes to Hell.