The VSTO startup sequence

Earlier this week I was looking for an old photo, and while browsing I came across a photo I took of my whiteboard in my office at Microsoft in 2004. Or rather, it was two photos; I’ve crudely stitched them together. Click on the image for a larger version.

OMG. What. The. Heck. Is. All. That. Nonsense?

Let me start with a little history.

Before I was on the C# team and after I was on the scripting languages team, I spent a couple years at Microsoft working on the plumbing for Visual Studio Tools for Office.

The idea of VSTO was to bring the ability to write truly rich, client-server aware, data-driven applications in Office documents using C# or Visual Basic; we wanted to go beyond the scripts and productivity tools typical of VBA customizations.

This was a project with many, many difficulties, both technical and political. On the political side of things, I think it is fair to say that the Office team has historically had (with good reason!) a great deal of resistance to taking any compatibility burden that would possibly slow down their ability to innovate in future releases; “platformizing” Word and Excel into true application hosts by a team external to Office was just such a burden.

The technical difficulties were considerable, in large part due to concerns about the security model. We were deeply, painfully aware of how Office customizations and scripting languages had been misused in the past as vectors for malware, and we did not want to create new vulnerabilities. As mitigation, we designed a mechanism that would isolate any customization code to its own appdomain with a highly restrictive default security policy.

Office, however, was not at the time designed to host the CLR. They were only willing to give us a single callback to our loader code that kicked off the whole process when a customized spreadsheet or document was loaded.

By 2004 we were on the fourth revision to our loading algorithm and I was tasked with coming up with the fifth; to facilitate discussion of options I drew a diagram on my whiteboards which I helpfully titled “HIGHLY SIMPLIFIED STARTUP SEQUENCE v4”.

A few things strike me about this diagram now, over 16 years later.

First: though it looks like a mess, I did actually put some thought into the design elements.

The diagram is divided into three sections, separated by heavy blue vertical lines. On the left are components running entirely outside of the CLR; in the middle are components that run in the CLR’s default appdomain, and on the right are components that run in the customization’s restricted appdomain. (And of course on the extreme far left is the edge of my “THE MATRIX” poster. A lot of the code names of the different parts of the project were references to The Matrix, including the team cover band that I played keyboards for. I am sad that I can no longer wear my “The Red Pills” polo shirt in public due to the co-opting of that movie reference by misogynist jerks.)
The purple boxes that run along the top are components and the lollipops give the interfaces they implement.
The purple boxes and arrows below give the exact sequence of twenty different method calls showing what component is calling what other component with what data, and why. In particular the diagram allows us to easily see when a component running in a more restricted security environment is calling into a less restricted environment; those calls need to be allowed because we need them to happen, but that then means that maybe hostile user code could call them, which could be bad.
Design problems, questions, annotations and proposed changes are shown in blue.
Red is used to identify one key component and an important question about it.
I have no idea what that yellow code fragment is or why it was written over top of everything else. It looks irrelevant.

The purpose of the diagram was originally to clarify in my own mind what the sequence was and what the problems were, but by the time it was in this form it was also for giving context to my coworkers when we were discussing options, so it had to be readable. I probably redrew this diagram a half a dozen times before it got to this state.

Second: we can see that there were a good half dozen or more design problems that I was trying to solve here but the big problems involved dirty documents and application manifests.

When you close Word or Excel, you have I am sure noticed that sometimes you get a “save your work?” dialog and sometimes the app just closes. The app is keeping track of whether the document is dirty — changed since it was last loaded or saved — or clean.

Suppose we load a customized spreadsheet, and initializing the customization causes the installer to notice that there is a newer version that it should be downloading. That might change the manifest information about the customization, so the spreadsheet is now “dirty”. But we do not want to ever unnecessarily dirty a document, because that is confusing and irritating to the user.

In step nine the fake application activator obtains an IAppInfo reference from the appdomain manager, updates the manifest from the customization’s server, and parses the manifest. My comments say:

Do not write back at this point; need to maintain dirty state
No, don’t do this at all. Host must provide updated manifest. This is not a VSTA feature, it is VSTO. (Meaning here that something here is unique to Visual Studio Tools for Office, and not the generalization of it we were working on, VST for Applications.)
Must do both. Don’t write back. AIState object must ensure dirtyness.

Apparently I was deeply conflicted on this point. I don’t recall how it was resolved.

My favourite comment though is the one in red:

Can we take manifest out of doc? Peter: “It would be awesome. If assembly is available offline, so is manifest”.

The scenario here had something to do with both the dirty bit problem, and more generally dealing with locally cached customizations. We did a bunch of work on the security model for “what happens if you’re trying to run a customization while your laptop is in airplane mode and we can’t get to the server to check for updates”. Peter is of course legendary Microsoft PM Peter Torr with whom I worked for many years.

My second favourite was where I said “OFFICE12?” Yeah, what’s going to happen when Office revs? Can we guarantee that all this stuff keeps working?

Third: It’s funny how the mind works. Though I’ve described the organization of the diagram and the major problems, today I remember almost none of what is going on here, what the specific issues were, or how we resolved them. But that whole sequence was intensely important to me for several months of my life; it was the foundational plumbing to the entire endeavor and so I knew it literally forwards and backwards. Those memories are 90% gone now. And yet if someone were to yell the numbers “nine six seven eleven eleven” at me from across the street I would be unable to not think “call Pizza Pizza, right away“. Thanks, 1980s jingle writers.

Fourth: I often think about this sort of thing in the context of those “tell me about a time you solved a design problem” interview questions. This “highly simplified” startup sequence with its twenty method calls has to balance:

security
performance
privacy
debuggability
code maintainability
versioning
robustness
graceful failure
user irritation

and numerous other design criteria. But can you imagine trying to explain any detail of this diagram to someone with no prior knowledge in a 45 minute interview? Real-world design problems are hard precisely because there are so many conflicting goals and messy politics. And worse, too often this is the institutional knowledge that is never written down and then lost.

Coming up on FAIC: Not sure!

I want to embark upon a more detailed dive into Bean Machine
We have just open-sourced a tool we use for benchmarking PPLs internally; I’d like to talk about that a bit
I’ve developed a little AST rewriting library in Python that is kinda fun; I could delve into the ideas behind that.

Let me know in the comments what you think.

7 thoughts on “The VSTO startup sequence”

petercooperjr on October 29, 2020 at 11:31 am said:

Maybe if you’re really looking for filling in extra time you have with more blogging, finish some of those unfinished series? I know at least Z-Machine was left hanging; I think some of the other ones might have been too. 🙂

Thanks for all you write!

Reply ↓
- ericlippert on October 29, 2020 at 4:39 pm said:
  
  Yeah that’s on my long list of stuff to do. 🙂
  
  Reply ↓
zarenor on October 29, 2020 at 12:05 pm said:

I’m always into benchmarks! I find it interesting trying to find good measurements and control for the variability from machine to machine and moment to moment.

AST rewriting also piques my interest, I’m curious what kind of transformations you’re doing there.

Reply ↓
- Grendel on October 30, 2020 at 6:48 am said:
  
  Same, I’m really interested in the world of ASTs and what you can do with them but never really had a reason to actively do anything other than read wikipedia entries when I’m bored at work. Eric doing a series on them would be amazing.
  
  Reply ↓
Pingback: Dew Drop – October 30, 2020 (#3308) | Morning Dew
reillypwood on October 30, 2020 at 9:00 am said:

This was fascinating, thanks. VSTO is still alive and kicking in some niches – I just ordered your 2009 book on it this week!

The new web-based Office add-ins aren’t good enough for many use cases, and so even though Microsoft is strongly discouraging people from creating new VSTO addins, people are still doing it.

Reply ↓
- ericlippert on October 30, 2020 at 2:40 pm said:
  
  That’s awesome. I look forward to spending the 30 cents in royalties I’ll make from that sale. 🙂
  
  Reply ↓