How Do The Script Garbage Collectors Work?

NOTE: This article was written in 2003. Since that time the JavaScript garbage collector has been completely rewritten multiple times, so as to be more performant in general, to handle the larger working sets entailed by modern web applications that we had absolutely no idea were coming when we designed the JScript GC back in 1995, to be better at predicting when there is garbage that needs collecting, and to be better at handling circular references involving browser objects.

I did not do any of that work; I stopped working on scripting in 2001. I do not know how the modern JavaScript GC works. This article should be considered “for historical purposes only”; it does not reflect how JavaScript works today.


JScript and VBScript both are automatic storage languages. Unlike, say, C++, the script developer does not have to worry about explicitly allocating and freeing each chunk of memory used by the program. The internal device in the engine which takes care of this task for the developer is called the garbage collector.

Interestingly enough though, JScript and VBScript have completely different garbage collectors. Occasionally people ask me how the garbage collectors work and what the differences are.

JScript uses a nongenerational mark-and-sweep garbage collector. It works like this:

  • Every variable which is “in scope” is called a “scavenger”. A scavenger may refer to a number, an object, a string, whatever.
  • We maintain a list of scavengers — variables are moved on to the scavenger list when they come into scope and off the list when they go out of scope.
  • Every now and then the garbage collector runs.
  • First it puts a “mark” on every object, variable, string, etc – all the memory tracked by the GC. JScript uses the VARIANT data structure internally and there are plenty of extra unused bits in that structure, so we just set one of them.
  • Second, it clears the mark on the scavengers and the transitive closure of scavenger references. So if a scavenger object references a non-scavenger object then we clear the bits on the non-scavenger, and on everything that it refers to. (I am using the word “closure” in a different sense than in my earlier post.)
  • At this point we know that all the memory still marked is allocated memory which cannot be reached by any path from any in-scope variable.
  • All of those objects are instructed to tear themselves down, which destroys any circular references.
  • Actually it is a little more complex than that, as we must worry about details like “what if freeing an item causes a message loop to run, which handles an event, which calls back into the script, which runs code, which triggers another garbage collection?” But those are just implementation details.
  • Incidentally, every JScript engine running on the same thread shares a GC, which complicates the story even further.)

You’ll note that I hand-waved a bit there when I said “every now and then…” What we do is keep track of the number of strings, objects and array slots allocated. We check the current tallies at the beginning of each statement, and when the numbers exceed certain thresholds we trigger a collection.

The benefits of this approach are numerous, but the principle benefit is that circular references are not leaked unless the circular reference involves an object not owned by JScript.

However, there are some down sides as well.

Performance is potentially not good on large-working-set applications — if you have an app where there are lots of long-term things in memory and lots of short-term objects being created and destroyed then the GC will run often and will have to walk the same network of long-term objects over and over again. That’s not fast.

The opposite problem is that perhaps a GC will not run when you want one to. If you say blah = null then the memory owned by blah will not be released until the GC releases it. If blah is the sole remaining reference to a huge array or network of objects, you might want it to go away as soon as possible.

You can force the JScript garbage collector to run with the CollectGarbage() method, but I don’t recommend it. The whole point of JScript having a GC is that you don’t need to worry about object lifetime. If you do worry about it then you’re probably using the wrong tool for the job.

VBScript, on the other hand, has a much simpler stack-based garbage collector. Scavengers are added to a stack when they come into scope, removed when they go out of scope, and any time an object is discarded it is immediately freed.

You might wonder why we didn’t put a mark-and-sweep GC into VBScript. There are two reasons. First, VBScript did not have classes until version 5, but JScript had objects from day one; VBScript did not need a complex GC because there was no way to get circular references in the first place! Second, VBScript is supposed to be like VB6 where possible, and VB6 does not have a mark-n-sweep collector either.

The VBScript approach pretty much has the opposite pros and cons. It is fast, simple and predictable, but circular references of VBScript objects are not broken until the engine itself is shut down.

The CLR GC is also mark-n-sweep but it is generational – the more collections an object survives, the less often it is checked for life. This dramatically improves performance for large-working-set applications. Of course, the CLR GC was designed for industrial-grade applications, the JScript GC was designed for simple little web pages.

What happens when you have a web page, ASP page or WSH script with both VBScript and JScript? JScript and VBScript know nothing about each others garbage collection semantics. A VBScript program which gets a reference to a JScript object just sees another COM object. The same for a VBScript object passed to JScript. A circular reference between VBScript and JScript objects would not be broken and the memory would leak (until the engines were shut down). A noncircular reference will be freed when the object in question goes out of scope in both language (and the JS GC runs.)


This article created a large amount of feedback when it was first published, much of it, oddly enough, reacting negatively to the phrase “the JScript GC was designed for simple little web pages”.

I have no idea why this fact would elicit such strong reactions. The JScript GC, and everything else about JScript, was designed for simple little web pages. We designed it in 1995! As I have said many times: in 1995 the by-design purpose for JavaScript on a web page was “make the monkey dance when you move the mouse”. The idea that there would be a hundred thousand lines of Javascript frameworks downloaded for typical web pages was absurd. We literally designed the script engines assuming that one-line event handlers would be typical, thirty lines would be plausible, and a thousand lines would be crazy, but we should test it.

In a similar vein, Brendan Eich (the original designer of JavaScript who stepped down as CEO of Mozilla after it came to light that he donated to anti-equality causes) weighed in to note that the algorithm that predicted when to collect was deeply flawed and could produce poor behaviour in pages that allocate a lot of memory. That was completely correct. The collection trigger was designed for simple little web pages, not for scenarios in which the collector could exhibit quadratic behaviour under load.

You can certainly make the argument that all of this should have been improved as it became more obvious that large-scale programs were being written in JavaScript, a language which was then very unsuitable for large-scale programs. I made that argument myself. I did not stop improving the script engines by my choice; my team was de-funded. Take it up with Bill Gates, not me.

What are closures?

NOTE: This article was written in 2003; the circular-reference memory leak bug described here was fixed in IE shortly after this article was written. This blog archive is for historical purposes; go ahead and use closures today.


JavaScript, as I noted yesterday, is a functional language. That doesn’t mean that it works particularly well (though I hope it does) but rather that it treats functions as first-class objects. Functions can be passed around and assigned to variables just as strings or integers can be.

A reader commented yesterday that “closures are your friends”. Unfortunately there are important situations where closures are not your friends! Let’s talk a bit about those. First off, what’s a closure? Consider the following (contrived and silly, but pedagocially clear) code:

function AddFive(x) {
  return x + 5;
}
function AddTen(x) {
  return x + 10;
}
var MyFunc;
if (whatever)
  MyFunc = AddFive;
else
  MyFunc = AddTen;
print(MyFunc(123)); // Either 133 or 128.

Here we have a typical functional scenario. We’re deciding which function to call based on some runtime test. Now, one could imagine that you’d want to generalize this notion of an “adder function”, and you would not want to have to write dozens and dozens of adders. What we can do is create an adder factory:

function AdderFactory(y) {
  return function(x){return x + y;}
}
var MyFunc;
if (whatever)
  MyFunc = AdderFactory(5);
else
  MyFunc = AdderFactory(10);
print(MyFunc(123)); // Either 133 or 128.

The anonymous inner function remembers what the value of y was when it was returned, even though y has gone away by the time the inner function is called! We say that the inner function is closed over the containing scope, or for short, that the inner function is a closure.

This is an extremely powerful functional language feature, but it is important to not misuse it. There are ways to cause memory-leak-like situations using closures. Here’s an example:

< div id="myMenu" class="menu-bar" > </ div >
var menu = document.getElementById('myMenu');
AttachEvent(menu);
function AttachEvent(element) {
  element.attachEvent("onmouseover", mouseHandler);
  function mouseHandler(){ /* whatever */ }
}

Someone has, for whatever reason, nested the handler inside the attacher. This means that the handler is closed over the scope of the caller; the handler keeps around a reference to element which is equal to menu, which is that div. But the div has a reference to the handler.

That’s a circular reference.

The garbage collector is a mark-and-sweep collector so you’d think that it would be immune to circular references. But the div isn’t a JavaScript object; it is not in the JavaScript collector, so the circular reference between the div and the handler will not be broken until the browser completely tears down the div. Which never happens.

Doesn’t IE tear down the div when the page is navigated away? Though IE did briefly do that, the application compatibility lab discovered that there were actually web pages that broke when those semantics were implemented. (No, I don’t know the details.) The IE team considers breaking existing web pages that used to work to be worse than leaking a little memory here and there, so they’ve decided to take the hit and leak the memory in this case.

Don’t use closures unless you really need closure semantics. In most cases, non-nested functions are the right way to go.


Reader responses:

How can you say that JavaScript is a functional language?

I take an expansive view of what makes a functional language: it is a language that supports programming in a functional style, not a language that forces you to program in a functional style. Enforcing side-effect free programming like Haskell does is not a requirement of functional languages, and indeed if we made that requirement then many languages that people consider to be clearly functional, like Scheme and OCaml, would have to be considered non-functional.

Are JScript strings passed by reference?

Yesterday I asked “are JScript strings passed by reference (like objects) or by value (like numbers)?”

Trick question! It doesn’t matter, because you can’t change a string. Suppose they were passed by reference — how would you know? You can’t have two variables refer to the “same” string and then change that string. Strings are like numbers — immutable primitive values. (Note that JavaScript, unlike VBScript, does not support passing variables by reference at all. The question here is about whether the value passed as an argument is a reference to the string or a copy of the string.)

Of course, “under the covers” we actually have to pass the strings somehow. Generally speaking, strings are passed by reference where possible, as it is much cheaper in both time and memory to pass a pointer to a string than to make a copy, pass the value, and then destroy the copy.

That said, unfortunately there are scenarios in which strings are passed by reference, and then the callee immediately makes a copy of the string. Strings are represented internally as BSTRs which are not reference counted, so you have to make a copy if you want to express ownership.

Why do the script engines not cache dispatch identifiers?

You COM programmers out there are intimately familiar with the IDispatch interface, I’m sure. To briefly sum up for the rest of you, the point of IDispatch is that it allows a caller to call a function without actually knowing at compile time any details about the name or signature of that function. The caller passes a method name to IDispatch::GetIdsOfNames to get a dispatch identifier — an integer — which identifies the method, and then calls IDispatch::Invoke with the dispid and the arguments. The implementation of Invoke is then responsible for analyzing the arguments and calling the actual function on the object’s virtual function table.

This is how VBScript and JScript work. When you say

Set Frob = CreateObject("BitBucket.Froboznicator") 
Frob.Gnusto 123, "skidoo"

what VBScript actually does is pass the string "Gnusto" to GetIdsOfNames, and then passes the dispid and arguments to Invoke. The object then does the work of actually translating that into a real function call. This is how we get away with not having to write a down-to-the-machine-code compiler for VBScript and JScript.

One of the fundamental rules of OLE Automation is that for any object, any two calls to GetIdsOfNames with the same string must return the same number. This is so that callers can fetch the dispid once and reuse it, rather than having to fetch it anew every time they try to invoke the dispatch object.

VBScript and JScript do not cache the dispid. If you say

Set Frob = CreateObject("BitBucket.Froboznicator") 
Frob.Gnusto 123, "skidoo"
Frob.Gnusto 10, "lords a leaping"

then the script engine will call GetIdsOfNames twice, and get the same value both times.

Surely this is wasteful. Can’t we cache that thing? I mean, it is a small optimization; that call to Invoke is going to dwarf the expense of the GetIdsOfNames. It just seems like there ought to be something we could do here.

Appearances can be deceiving.

You might first think that every time we see Frob.Gnusto we can use the dispid we grabbed the first time and reuse it. For example:

Frob.Gnusto 123, "skidoo" 
' OK, Gnusto is dispid 0x1111.  Add to cache "Frob.Gnusto", 0x1111
Frob.Gnusto 10, "lords a leaping"  
' Look up "Frob.Gnusto" in cache, aha, it is 0x1111

This doesn’t work. Consider this scenario:

Set Frob = CreateObject("BitBucket.Froboznicator") 
Frob.Gnusto 123, "skidoo"
Set Frob = CreateObject("MushySoft.Gronker") 
Frob.Gnusto 10, "lords a leaping"  

Two objects sharing the same variable but with different types may have the same method name but different dispids. We would have to invalidate our cache every time the variable was changed, which is a lot of work to save very little time.

There are other reasons why this doesn’t work. Consider this JScript code:

var frob = new ActiveXObject("BitBucket.Froboznicator"); 
var frab = new ActiveXObject("BitBucket.Flouncer"); 
frob.gnusto(123, "skidoo"); // Cache "frob.gnusto", 0x1111 
with(frab) 
{ 
  frob.gnusto(10, "lords a leaping"); 
} 

Is the call to frab.frob.gnusto or frob.gnusto? The object in frab might have a method frob which has a method gnusto which has a different dispid.

Similarly, there are problems with multiple scopes where local variables may shadow globals. It’s a huge mess. The net result is that we can’t cache dispids against variable names. The variable names are just too ephemeral.

What about the object pointers? Every object has a unique 32 bit pointer associated with it, right? Why not cache the dispids against the pointer-and-method-name pair?

Unfortunately, that doesn’t work either. Suppose object Frob is at address 0x12345000 in memory.

Frob.Gnusto 123, "skidoo" 
' Gnusto is dispid 0x1111. Add to cache "0x12340000.Gnusto", 0x1111
Frob.Gnusto 10, "lords a leaping"  
' Look up "0x12340000.Gnusto" in cache, aha, it is 0x1111

This might look fine, but again, it isn’t:

Set Frob = CreateObject("BitBucket.Froboznicator") 
Frob.Gnusto 123, "skidoo" 
Set Frob = Nothing
Set Frob = CreateObject("MushySoft.Gronker") 
Frob.Gnusto 10, "lords a leaping"  

What happened on that fourth line of code there? We previously threw away the only reference to the current value of Frob, freeing the memory. Then the operating system went and created a new object. The operating system could be very smart about memory re-use. It knows that it has a perfectly good free pointer at 0x12340000, so it might re-use it. Now our cache needs to be invalidated every time an object is freed! We must keep track of when every single object is freed — basically, we have to write a garbage collector for arbitrary COM objects!

To cache dispids you need to ensure that the lifetime of the object is greater than the lifetime of the cache. But object lifetimes are so unpredictable that it is very hard to know when the cache is invalid. We once considered adding dispid caching to those objects where we knew that the objects would live longer than the script — the window object in IE, or the Response object in IIS, but rejected the proposal as too much complication for very little performance gain.

What Are “Anonymous Functions” In JScript?

A member of our excellent customer support staff in the United Kingdom asked me this morning what “anonymous functions” are in JavaScript. It’s a little complicated. Two things come to mind: realio-trulio anonymous functions, and what we used to call “scriptlets”.

First up are actual anonymous functions — functions which do not have names are, unsurprisingly enough, called “anonymous functions”.

“What the heck do you mean, functions without names?” I hear you ask. “Surely all functions have names!” — well, no, actually, some don’t. This is perfectly legal in JavaScript:

print(function(x) { return x * 2; } (3) );

That prints out “6”. What’s the name of that function? It has no name. Of course, we could give it one if we chose. This is exactly the same as declaring a named function:

function double(x) { return x * 2; }
print(double(3));

But functions don’t need names any more than strings or numbers do. Functions are just functions whether they’re named or not.

Remember, JavaScript is a functional language: functions are objects and can be treated like any other object. You don’t have to give an object a name to use it, or you can give them multiple names. Functions are no different. For example, you can assign them to variables:

var myfunc = function(x) { return x * 2; }
var myfunc2 = myfunc;
print(myfunc(3));

Those of you who are familiar with more traditional functional languages, such as Lisp or Scheme, will recognize that functions in JavaScript are fundamentally the lambda calculus in fancy dress. (The august Waldemar Horwat — who was at one time the lead Javascript developer at AOL-Time-Warner-Netscape — once told me that he considered Javascript to be just another syntax for Common Lisp. I’m pretty sure he was being serious; Waldemar’s a hard core language guy and a heck of a square dancer to boot.) I’ll discuss other functional language properties of JavaScript in a future post.

One can also construct anonymous functions at runtime with the Function constructor, though why you’d want to is beyond me:

print(new Function("x", "return x * 2;")(3));

I recommend against constructing new functions at runtime based on strings, but that’s also a subject for a future post.

Interestingly enough, what this does internally is constructs the following script text, and compiles it:

function anonymous(x) { return x * 2; }

So in fact, this actually does not compile up an anonymous function — it compiles up a non-anonymous function named “anonymous”! The mind fairly boggles.

Second, there are the “anonymous functions” constructed when the browser builds an event handler.


Aside: In the source code for the script engine these things are called “scriptlets”, which is a terrible, undescriptive, confusing name. At one point there were three technologies all competing for the name “scriptlet”. The technology presently known as Windows Script Components was originally called “Scriptlets”, which explains the name of the Wrox book “Instant Scriptlets” — they published based on a beta released before the name was finalized. We considered calling Windows Script Components “Scriptoids” and “Script Thingies”, but fortunately cooler heads prevailed.


I digress. When you have a button in IE with an event handler, the script engine creates a separate compilation scope for the form and compiles up this string:

function anonymous() { window.alert('Oooh, you clicked me!'); }

It then passes the function object back as a dispatch pointer and the browser assigns it to the button’s onclick property. Again, this is a non-anonymous function named “anonymous”.


Some comments from readers:

You can also use anonymous functions to make closures:

function MakeAdder(a) 
{
  return function(b) { return a + b; }
}
var Add7 = MakeAdder(7);
var Add13 = MakeAdder(13);
WScript.Echo(Add13(Add7(22)));

Indeed, that is a subject for another posting. Though I note that closures are sometimes not your friend; there are memory leaks associated with closures in some scenarios.

I’ve always thought of functional languages as being like Haskell, or even XSLT, where the emphasis is on expression evaluation without side-effects

By “functional” in this context I just mean “functions are first class objects”. Now, one can certainly argue that so-called “pure” functions are, well, pure. You know, the way, the truth, the light, all that good stuff. The notion that there is a right way and a wrong way to design functions is certainly endemic to users of more traditional functional languages like Scheme, ML, Haskell, etc.

Here’s another way to think about it — “functional programming” is writing programs in functional style: no side effects, etc, etc, etc. A “functional programming language” is a language in which one can do functional programming.

JavaScript provides you the tools necessary to do functional programming, so it is a functional language. Most people treat JavaScript as a procedural language or an object oriented language, but that doesn’t mean that it isn’t also a functional language!

Here’s an example that shows the true utility of anonymous functions:

A.sort(function(a,b){return a.x<b.x?-1:1})

That’s a great point; the nice thing about anonymous functions is that you can express an intention close to where it is being used.

I’m curious about how anonymous functions work in C#. In the code below, isn’t the value type int stack-allocated by the runtime thereby allowing you to change the value of a non-existent stack variable by executing the code below?

class Program
{
  delegate int TestDelgate();
  private static TestDelgate d;
  static void Main(string[] args)
  {
    SetupDelegate();
    Console.WriteLine(d().ToString());
    Console.WriteLine(d().ToString());
    Console.WriteLine(d().ToString());
    Console.ReadKey();
  }
  static void SetupDelegate()
  {
    int x = 5;
    d = delegate() { ++x; return x; };
  }
}

First, the notion that integers are allocated “on the stack” is a common misperception. An array of integers is not allocated on the stack, and an integer field of a class is not allocated on the stack. The reason why locals are allocated on the stack is because usually their lifetimes are short; in this case, the lifetime of the local is not short, and so it is not allocated on the stack! The local is hoisted to be a field of a class, and is therefore allocated off the heap. That is, your code is equivalent to:

class Program 
{
  [...]
  private sealed class Locals 
  {
    public int x;
    public int M() { ++x ; return x; }
  }
  static void SetupDelegate() 
  {
    Locals locals = new Locals();
    locals.x = 5;
    d = locals.M;
  }
  [...]

Smart pointers are too smart

Joel’s law of leaky abstractions rears its ugly head once more. I try to never use smart pointers because… I’m not smart enough.

COM programmers are of course intimately familiar with AddRef and Release. The designers of COM decided to use reference counting as the mechanism for implementing storage management in COM, and decided to put the burden of implementing and calling these methods upon the users.

It is very natural when using a language like C++ to say that perhaps we can encapsulate these semantics into an object, and let the C++ compiler worry about calling the AddRefs
and Releases in the appropriate constructors, copy constructors and destructors. It is very natural and very tempting, and I avoid template libraries that do so like the plague.

Everything usually works fine until there is some weird, unforeseen interaction between the various parts. Let me give you an example: Suppose you have the following situation: you have a C++ template library map mapping IFoo pointers onto IBar pointers, where the pointers to the IFoo and IBar are in smart pointers. You want the map to take ownership of the pointers. Does this code look correct?

map[srpFoo.Disown()] = srpBar.Disown();

It sure looks correct, doesn’t it?

Look again. Is there a memory leak there?

I found code like this in a library I was maintaining once, and since I had never used smart pointer templates before, I decided to look at what exactly this was doing at the assembly level. A glance at the generated assembly shows that the order of operations is:

1) call srpFoo.Disown() to get the IFoo*

2) call srpBar.Disown() to get the IBar*

3) call map‘s operator[] passing in the IFoo*, returning an IBar**

4) do the assignment of the IBar* to the address returned in (3).

So where is the leak? This library had C++ exception handling for out-of-memory exceptions turned on. If the operator[] throws an out-of-memory exception then the not-smart IFoo* and IBar* presently on the argument stack are both going to leak.

The correct code is to copy the pointers before you disown them:

map[srpFoo] = srpBar;
srpFoo.Disown();
srpBar.Disown();

Before the day that I found this I had never been in a situation before where I had to think about the C++ order of operations for assigning and subscripting in order to get the error handling right! The fact that you have to know these picky details about C++ operator semantics in order to get the error handling right is an indication that people are going to get it wrong.

Let me give you another example. One day I was adding a feature to this same library, and I noticed that I had caused a memory leak. Clearly I must have forgotten to call Release somewhere, or perhaps some smart pointer code was screwed up. I figured I’d just put a breakpoint on the AddRef and Release for the object, and figure out who was calling the extra AddRef.

Here — and I am not making this up! — is the call stack at the point of the first AddRef:

ATL::CComPolyObject<CLayMgr>::AddRef
ATL::CComObjectRootBase::OuterAddRef
ATL::CComContainedObject<CLayMgr>::AddRef
ATL::AtlInternalQueryInterface
ATL::CComObjectRootBase::InternalQueryInterface
CLayMgr::_InternalQueryInterface
ATL::CComPolyObject<CLayMgr>::QueryInterface
ATL::CComObjectRootBase::OuterQueryInterface
ATL::CComContainedObject<CLayMgr>::QueryInterface
CDDS::FinalConstruct
ATL::CComPolyObject<CDDS>::FinalConstruct
ATL::CComCreator<ATL::CComPolyObject<CDDS>>::CreateInstance
CTryAssertComCreator<ATL::CComPolyObject<CDDS>>::CreateInstance
ATL::CComClassFactory2<CDLic>::CreateInstance(ATL::CComClassFactory2
CTryAssertClassFactory2<CDLic>::CreateInstance(CTryAssertClassFactory2<CDLic>

Good heavens!

Maybe you ATL programmers out there are smarter than me. In fact, I am almost certain you are, because I have not the faintest idea what the differences between a CComContainedObject, a CComObjectRootBase and a CComPolyObject are! It took me hours to debug this memory leak. So much for smart pointers saving time!

I am too dumb to understand that stuff, so when I write COM code, my implementation of AddRef is one line long, not hundreds of lines of dense macro-ridden, templatized cruft winding its way through half a dozen wrapper classes.


Some reader responses:

Do you really think [the original example] is a normal way of using smart pointers? It is not.

I have no idea if it is “normal” or not, but it is how I found the code that I was reading, and there was plenty more code like it. Saying “you’re doing it wrong” doesn’t help. If the easy, obvious way to do a thing, and the correct, safe way to do a thing are different, then we have a problem.

You don’t have to use or understand things like CComObjectRootBase if all you want is a smart pointer.

Well, first off, I don’t want smart pointers. Second, again, this was not a hole of my own digging here. I was asked to help out with an existing codebase that used smart pointers, and debugging memory leaks in it really slowed me down.

The real problem is that exceptions were bolted on to C++ late in the process.

I agree that exceptions are a poor fit for non-GC’d languages, but again, that’s the world we live in, and we should adapt our tools to reality, rather than building tools that make real-world scenarios more difficult to analyze.

Using smart pointers is almost always a better idea than trying to manage the refcounts “by hand”.

I thoroughly disagree. I’ve debugged ref count bugs caused by me forgetting an AddRef, and I’ve debugged ref count bugs like the one above with the fifteen-deep call stack of meaningless (to me) gibberish, and the former are a whole lot easier to track down.

When I write COM code it tends to emphasize the error paths and cleanup code rather than the program logic, and that’s unfortunate. But given that the error paths and cleanup code are where most of the bugs are, I want them to be as explicit as possible. Those are the program logic.

I understand the urge to abstract them away into classes that take care of the details for you. But in my experience, the abstraction is both complex and leaky. The abstraction is easy to use for simple cases, and thereby affords creation of much more subtle, hard-to-debug bugs.

My point, simply, is that AddRef, Release and QueryInterface are a pain in the rear to use, but they are easy to understand. Smart pointers are easy to use and hard to understand, and I prefer to understand the code I write.

So basically you are saying smart pointers suck, because you don’t know how to use them properly?

No, I’m saying they suck because they are easy to accidentally use improperly, and because they slow down my ability to understand and debug a buggy codebase.

An abstraction is supposed to relieve you of the burden of understanding the abstracted thing, but smart pointers do a poor job of that. In order to use them correctly, you have to understand exactly what they’re doing to ensure that you don’t use smart pointers to violate the very rules they are designed to abstract away.

That’s a lousy abstraction. It is easier for me to learn the rules of COM and apply them than to learn the rules of COM and learn the entire smart pointer framework.

The .NET framework does a much better job of abstracting away the details of how the underlying system works, because it was designed to do that from day one. Smart pointers are a kludge written post hoc.

You have bad smart pointers! Don’t blame a concept because of a lame implementation!

Again, I’m not the one choosing what implementation of smart pointers I get to use. Maybe there are smart pointer libraries that do not have these problems, but I’m betting they just have different problems.

This code has exactly the same problem, and no smart pointers: map[new Foo()] = new Bar();

Sure, and I can look at that code and immediately know that it is wrong without having to understand the semantics of “disown” operations. The only thing I need to know to deduce the wrongness of that code is the basic rules of C++.

Eric’s Complete Guide to VT_DATE

I find software horology fascinating.

The other day, Raymond said “The OLE automation date format is a floating point value, counting days since midnight 30 December 1899. Hours and minutes are represented as fractional days.”

That’s correct, but actually it is a little bit weirder than that. I suspect that I may be the world’s leading authority on bugs having to do with the OLEAUT date format: a dubious distinction at best. I call it the VT_DATE format because it is the data stored in an OLEAUT variant of type VT_DATE.

Here are some interesting (well, interesting to me) facts about OLEAUT dates.

First of all, let’s start with the obvious problem — Midnight 30 December 1899 in what time zone? We never say. OLEAUT dates are always “local” which makes it very difficult to write code that uses OLEAUT dates — any VB or VBScript program, for example — which must deal with two things happening at the same time in different time zones.

Next, what about daylight savings time? How does one represent those days which due to springing forward or falling back have 23 hours or 25 hours? Again, those who use the OLEAUT date format need to pretend that these days do not exist.

Now let’s get really weird. An OLEAUT date is, as Raymond noted, a double where the signed integer part is the number of days since 30 December 1899 and the fraction part is the amount of that day gone by. So what is 1.75? That’s 6:00 PM, 31 Dec 1899. What about -1.75? That’s 6:00 PM, 29 Dec 1899. Notice how the 0.75 part means 6:00 PM in both cases; three-quarters of the way through the day.

What about 0.75 and -0.75? Uh, those are zero and “minus zero” days from 30 December 1899, again at 6:00 PM . Those two numbers are the same time. This means that any program which must calculate the difference between two OLEAUT dates must say that (-0.75) – (0.75) = 0 difference in time.

The reason I know all this is because my first ever checkin as a full timer was a rewrite of the VBScript implementations of DateAdd and DateDiff, both of which used to be a mass of spaghetti code to handle all the special cases entailed by the discontinuities between -1.0 and 1.0. Incidentally, I now ask interview candidates to write me those algorithms during interviews! (Attention potential interview candidates reading this: writing a mass of spaghetti code on my whiteboard is a bad idea. Solving this problem by cases is a bad idea. There are better ways to solve this problem!)

Here’s another bogus one: how about -1.99999999 and -2.0? Those are 0.00000001 apart in numbers but almost 48 hours apart in time! But OLEAUT dates are rounded to the nearest half second by the operating system, and you guessed it: any dates less than a quarter second before midnight before 30 Dec 1899 are sometimes “rounded” two days wrong. I wrote the code that converts OLEAUT dates to JScript dates, and it at least does correctly handle this case, though I had to jump through some hoops to do it.

(UPDATE: I have no idea if in the 16 years since I first wrote this article that bug in Windows was ever fixed. Anyone want to try it out and see?)

Some more oddities: The range is enormous, and the precision varies greatly over the range! You can represent dates long before the creation of the Universe, though you lose precision as you go. However, the only valid dates for the date format are between 100 AD and 10000 AD, and since we have half-second granularity, we are wasting a whole lot of bits here.

And finally, why 30 December 1899? Why not, say, 31 December 1899, or 1 January 1900 as the zero day? Actually, it turns out that this is to work around a bug in Lotus 1-2-3! The details are lost in the mists of time; what I can tell you is that Lotus 1-2-3 used this date format, and wished to have “day one” be 1 January 1900. However, they forgot that 1900 was not a leap year, and therefore their “day count” was off by one for every day after 28 February 1900. Microsoft chose to use the Lotus date format for Excel, for compatibility, but “fixed” this bug by moving “day one” back one day. Therefore day zero is 30 December 1899.

Getting date code right is hard this is one of those areas where messy human requirements are hard to translate into crisp machine logic. There are huge localization problems for instance, in Japan it is legal to specify years as the fifth year of the reign of Emperor Hirohito . Thailand’s calendar doesn’t count from 1 AD. In Israel, the start and end days for daylight savings time are not standardized but rather are declared anew by the government after fierce debate every year.

These situations are fraught with peril for the unwary developer. I once drove our Microsoft support team in Israel to distraction by accidentally changing both Arabic and Hebrew locales to display dates right-to-left. Apparently in Arabic dates are customarily written right-to-left, but Israelis use left-to-right dates even when they are embedded in right-to-left Hebrew text.

You wouldn’t think that something as simple as asking “what day is it?” could lead to so many problems, but the world is seldom as cut-and-dried as we software developers would like.

For more history on this bizarre date format, see this article by Joel Spolsky.

Bad Hungarian

One more word about Hungarian Notation and then I’ll let it drop, honest. (Maybe.)

If you’re going to uglify your code with Hungarian Notation, please, at least do it right. The whole point is to make the code easier to read and reason about, and that means ensuring that the invariants expressed by the Hungarian prefixes and suffixes are actually invariant.

Here’s some code I found once in the “diagram save” code of a Visio-like tool:

long lCountOfConnectors = srpConnectors->Count();
while( --lCountOfConnectors)
{
// [Omitted: get the next connector]
// [Omitted: save the connector to a stream]
}

OK, first of all, that should be cConnectors, c for “count of”. But that’s just a trivial question of what lexical convention we use. There’s a far more serious problem here. The number of connectors is not decreasing as we iterate the loop, so cConnectors should not be decreasing.

Hungarian makes it easier to reason about code, but only when you make sure that the algorithm semantics and the Hungarian semantics match. Seriously, when I first read this code I naturally assumed that it was removing connectors from the collection for some reason, and therefore decreasing the count so that the count variable would continue to match reality. But in fact it was just using the count as an index, which is semantically wrong. The code should read something like:

long cConnectors = srpConnectors->Count();
for(long iConnector = 0 ; iConnector < cConnectors ; ++iConnector)
{
// [...]
}

In other words, the name of a variable should reflect its meaning throughout its lifetime, not merely its initialization.

What are the VBScript semantics for object members?

In the previous episode we discussed how VBScript supports two kinds of reference semantics — reference types, and pass-by-reference. Clearly in order for VBScript to support pass-by-reference on variables, there has to be a variable to reference.

Consider our earlier example:

Sub Change(ByRef XYZ)
   XYZ = 5
End Sub

Dim ABC
ABC = 123
Change ABC

If that had been Change (ABC) then based on what you know from two posts ago, you’d know that it passes ABC by value, not by reference. So the assignment to XYZ would not change ABC in that scenario.

The rule is pretty simple: if you want to pass a variable by reference, you’ve got
to pass the variable, period.

This series of posts was inspired by an intrepid scripter who was trying to combine our
previous two examples. They had a program that looked something like this:

Class Foo
   Public Bar
End Class
Sub Change(ByRef XYZ)
   XYZ = 5
End Sub

Dim Blah
Set Blah = New Foo
Blah.Bar = 123
Change Blah.Bar

This in fact does not change the value. This passes the value of Blah.Bar, not a reference to Blah.Bar.

The scripter asked me “why does this not work the way I expect?” Here’s my Socratic dialog reply:

Q: Why does this not work the way I expect?

A: Because your expectations are inconsistent with the real universe. Adjust your
expectations and they’ll start being met!

Q: That is remarkably unhelpful. Let me rephrase: What underlying design principle did the VBScript designers use to justify this decision to pass by value, not reference?

A: The fundamental principle that governs this case was “do not be unnecessarily different from VB6.” VB6 does the same thing. Try it if you don’t believe me!

Q: You are begging the question. Why does VB6 do that?

A: Probably for backwards compatibility with VB5, 4, 3, 2 and 1, which incidentally was
called “Object Basic”. Ah, the halcyon days of my youth.

Q: More question begging! What was the initial justification on the day that by-reference calling was added to VB?

A: That is lost in the mists of time. That was like ten years ago! There are not very many of the original design team left. I was an intern at the time and they weren’t exactly consulting me on these sorts of decisions on a regular basis. It wasn’t so much “Eric, what do you think about these by reference semantics?” as “Eric, the OLE Automation build machine needs more memory, here’s a screwdriver.

However, you’re in luck. I seem to recall back in the dim mists of time someone telling me something about wanting to avoid copy-in-copy-out semantics on COM objects. Suppose for example you said:

Set Frob = CreateObject("BitBucket.Frobnicator")
SetToFive Frob.Rezrov

Now what happens? This isn’t a VB class, this is some third party COM object. COM objects do not have property slots, they have getter/setter accessor functions. There is no way to pass the value of Frob.Rezrov by reference because VB does not have psychic powers which tell it where in memory the implementers of BitBucket.Frobnicator happened to store the value of the Rezrov property.

Given that, how could you implement byref semantics? You could implement copy-in-copy-out semantics! VB would have to create a memory location, fill it with the value returned by get_Rezrov, pass the address of that location to SetToFive, and then upon SetToFive‘s return, it would have to call Frob::set_Rezrov with the new value put into the buffer.

Easy, right? Well, it gets weird once you start thinking about non-trivial functions.
Consider the case where SetToFive does not change the value of the by-ref variable. That call to set_Rezrov may have side effects, so do we really want to call it if nothing changed? It seems like that could potentially cause badness, and certainly cause poor performance. In a “realio-trulio byref” system we’d expect zero sets if there was no change but in copy-in-copy-out we end up with one call to the setter regardless. How could we avoid that unwanted call?

Well, we could create yet another temporary storage to keep the original value around and do a comparison when SetToFive returns. (Note that I’ve just waved my hands there; I’m assuming that the two values can sensibly be compared. Comparing two
things for equality is non-trivial, but that’s another posting.)

Anyway, what if the temporary storage variable changed during the execution of SetToFive and then changed back? In that case we’d expect two calls to the setter, but actually end up with no calls!

Naïve copy-in-copy-out doesn’t provide particularly good fidelity with true byref
addressing. The original designers of VB decided that it was simply not worth
the trouble to do it at all. It is much easier to simply say that members
of COM objects do not get copy-in-copy-out semantics, and therefore they cannot be
passed by reference. If you’re going to make that restriction for some COM objects,
it seems perverse to say “we’ll do this for third party COM objects but not for VB
class objects.” Thus, VBScript does not support passing object properties by
reference.