The JScript Type System, Part Two: Prototypes and constructors

A number of readers made some good comments on my article on JScript typing that deserve to be called out in more detail.

First, I was being a little sloppy in my terminology — I casually conflated static typing with strong typing, and dynamic typing with weak typing. Thanks for calling me on that. Under the definitions proposed by the reader, JScript would be a dynamically typed language (because every variable can take a value of any type) and a strongly typed language (because every object knows what type it is.) By contrast, C++ is a statically typed language (because every variable must have a type, which the compiler enforces) but also a weakly typed language (because the reinterpret cast allows one to turn pointers into integers, and so on.)

Second, a reader notes that one of the shortcomings of JScript is that though it is a strongly typed language (in our new sense) that it is a royal pain to actually determine the runtime type an object. The typeof operator has a number of problems:

  • null is listed as being of the object type, though technically it is a member of the Null type.
  • primitives (strings, numbers, Booleans) wrapped in objects are listed as being of the object type rather than their underlying type.
  • JScript, unlike VBScript, does not interrogate COM objects to determine the class name.
  • If JScript is passed a variant from the outside world that it cannot make sense of then typeof returns “unknown”.

Perhaps there is some other way. Prototype inheritance affords a kind of type checking, for example.

Prototype inheritance works like this: every JScript object has an object (or possibly null) called its prototype object.  Suppose an object foo has prototype object bar, and bar has prototype object baz, and baz has prototype object null. If you call a method on foo then JScript will search foo, bar and baz for that method, and call the first one it finds.

The idea is that one object is a prototypical object, and then other objects specialize it. This allows for code re-use without losing the ability to dynamically customize behaviour of individual objects.

Prototypes are usually done something like this:

var Animal = new Object();
// omitted: set up Animal object
function Giraffe()
{
  // omitted: initialize giraffe object.
}
Giraffe.prototype = Animal;
var Jerry = new Giraffe();

Now Jerry has all the properties and methods of an individual Giraffe object AND all the properties and methods of Animal.  You can use IsPrototypeOf to see if a given object has Animal on its prototype chain. Since prototype chains are immutable once created, this gives you a pretty reliable sort of type checking.

Note that Giraffe is not a prototype of Jerry. Note also that Animal is not the prototype of Giraffe! The object which is assigned to the prototype property of the constructor is the prototype of the instance.

Now, you all are not the first people to point out to me that determining types is tricky. A few years ago someone asked me what the differences are amongst

if (func.prototype.IsPrototypeOf(instance))

and

if (instance.constructor == func)

and

if (instance instanceof func)

The obvious difference is that the first one looks at the whole prototype chain, whereas the second two look at the constructor, right? Or is that true? Is there a semantic difference between the last two?

There is. Let’s look at some examples, starting with one that seems to show that there is no difference:

function Car() { }
var honda = new Car();
print(honda instanceof Car); // true
print(honda.constructor == Car);  // true

It appears that instance instanceof func and instance.constructor == func have the same semantics. They do not. Here’s a more complicated example that demonstrates the difference:

var Animal = new Object();
function Reptile() { }
Reptile.prototype = Animal;
var lizard = new Reptile();
print(lizard instanceof Reptile); // true
print(lizard.constructor == Reptile); // false

In fact lizard.constructor is equal to Object, not Reptile.

Let me repeat what I said above, because no one understands this the first time — I didn’t, and I’ve found plenty of Javascript books that get it wrong. When we say

Reptile.prototype = Animal;

this does not mean “the prototype of Reptile is Animal“.  It cannot mean that because (obviously!) the prototype of Reptile, a function object, is Function.prototype.  No, this means “the prototype of any instance of Reptile is Animal“.  There is no way to directly manipulate or read the prototype chain of an existing object in JScript.

Now that we’ve got that out of the way, the simple one first:

instance instanceof func means “is the prototype property of func equal to any object on instance’s prototype chain?”  So in our second example, the prototype property of Reptile is Animal and Animal is on lizard‘s prototype chain.

But what about our first example where there was no explicit assignment to the Car prototype?

The compiler creates a function object called Car.  It also creates a default prototype object and assigns it to Car.prototype.  So again, when we way

print(honda instanceof Car);

the instanceof operator gets Car.prototype and compares it to the prototype chain of honda. Since honda was constructed by Car it gets Car.prototype on its prototype chain.

To sum up the story so far,  instance instanceof func is actually a syntactic sugar for func.prototype.IsPrototypeOf(instance) This explains why lizard instanceof Reptile returns trueReptile.prototype is a prototype of lizard.

So what the heck is going on with the constructor property then?  How is it possible that we can say lizard = new Reptile(); and at the same time lizard.constructor == Reptile is false???

Let’s go back to our simple first example.  I said above that since Car has no prototype assigned to it, we create a default prototype.  During the creation of the default prototype, the interpreter assigns Car to Car.prototype.constructor.  That might be a little confusing, so let’s look at some pseudocode.  This:

function Car(){}

logically does the same thing as

var Car = new Function();
Car.prototype = new Object();
Car.prototype.constructor = Car;

Now we say

var honda = new Car();
print(honda.constructor == Car );

and what happens? honda has no constructor property, so it looks on the prototype chain for any object with a constructor property. In this case Car.prototype is on the prototype chain and it has a constructor property equal to Car, so the comparison is true.  Remember, any property of an object’s prototype object is treated as a property of the object itself – that’s what “prototype” means.

Now let’s look at our second example:

var Animal = new Object();
function Reptile(){ }
Reptile.prototype = Animal;

Logically this does the same thing as

var Animal = new Object();
var Reptile = new Function();
Reptile.prototype = new Object();
Reptile.prototype.constructor = Reptile;
Reptile.prototype = Animal;

Whoops. The default prototype has been thrown away.  Now when we say

print(lizard.constructor == Reptile );

what happens? lizard does not have a constructor property, so we look at the prototype chain and find Animal.  But Animal also does not have a constructor property either! So we look on Animal‘s prototype chain. Animal was constructed via new Object() so therefore it has Object.prototype on its prototype chain, and Object.prototype has a constructor property.  As you might expect from our previous discussion of how the constructor property is initialized, Object.prototype.constructor is set to Object.

Therefore lizard.constructor is equal to Object, not Reptile, even though lizard is an instance of Reptile and was constructed by the Reptile function object!

You would think that the script engine would automatically assign the constructor property to the object when it was constructed, but it does not. It assigns the property to the prototype and relies on prototype inheritance. I was not a member of the ECMAScript committee when this decision was made, so I don’t know why we standardized this rather bizarre behaviour, but we’re stuck with it now!


Notes from 2020:
This article is one I referred back to so many times over the years I worked on JScript; the prototype mechanism can be very confusing.

There were a few interesting comments when I first wrote this. A couple highlights:

  • In some implementations of ECMAScript you can manipulate the proto chain directly with the __proto__ property. Make sure you understand what you’re doing before you change it!
  • Using “new String” to make a string can lead to an object that has different properties than just making the “primitive” string. This is by design — bad design, but design nevertheless. This was the subject of part three of this series.

What is the Matrix?

I’m going to make a rare departure from technical stuff for a moment.  I’ve seen Matrix Revolutions twice in the last twelve hours, once in IMAX.

A number of people have asked me for an opinion.  To sum up in a spoiler-free manner:

First off, IMAX was better, though not enormously so.  (Oddly enough, Reloaded was enormously better in IMAX.)  As an example of state-of-the-art action movie making, it kicked ass.  But what about the underlying theme?  Let’s summarize:

The Matrix was a movie about kung fu vs. evil robots that asked and answered the philosophical question “what is the nature of reality?”

The Matrix Reloaded was a movie about kung fu vs. evil robots that asked and answered the philosophical question “what is the nature of free will?”

Matrix Revolutions was a movie about kung fu vs. evil robots that asked and answered the philosophical question “if Neo got into a fight with Agent Smith, who would win?”

We return you now to your regularly scheduled programming.


Commentary from 2019:

As I’ve noted before, the VSTO team code names and whatnot were all taken from The Matrix, and it was a fun theme.

I am not “spoiler averse” but I saw the first movie in the series knowing absolutely nothing about it, and was thrilled to discover that it had kung fu, evil robots, and literal brain-in-a-vat philosophy undergrad thought experiments. Sure, lots of it was hokey and unbelievable, but it had such energy, and was genuinely surprising.

I should not have been surprised by a reversion to the mean in the sequels; the original was hard to top.

 

The JScript Type System, part one

I thought I might spend a few days talking about the JScript and JScript .NET type systems, starting with some introductory material.

Consider a JScript variable:

var myVar;

Now think about the possible values you could store in the variable. A variable may contain any number, any string or any object. It can also be true or false or null or even undefined. This is a rather large set of possible values. In fact, the set of all legal values is infinite. Countably infinite, and in practice limited by available memory, but in theory there is no upper limit.

A type is characterized by two things, a set and a rule. First, a type consists of a subset (possibly infinitely large) of the set of all possible values. Second, a type defines a rule for transforming values outside the set into values in the set. (This rule may specify that certain values are not convertible and hence produce “type mismatch” errors.)

For example, String is a type. The set of all possible strings is an (infinite) subset of the set of all possible values, and there are rules for determining how all non-string values are converted into strings.

JScript Classic is a dynamically typed language. This means that any value of any type may be assigned to any variable without restriction. It is often said — inaccurately — that “JScript has only one type”. This is true only in the sense that JScript has no restrictions on what data may be assigned to any variable, and in that sense every variable is “the same type” – namely, the “any possible value” type. However, the statement is misleading because it implies that JScript supports no types at all, when in fact it supports six built-in types.

JScript .NET, by contrast, is an optionally statically-typed language. A JScript .NET variable may be given a type annotation which restricts the values which may be stored in the variable. This annotation is optional; an unannotated variable acts like a JScript variable and may be assigned any value.

JScript has the property that a value can always describe its own type at runtime. This is not true in, say, C, where you can have a void* and no way of asking it “are you pointing to an integer or a string variable?” In JScript, you can always ask a value what its type is and it will tell you.

The concept of subtyping is not particularly important in JScript Classic though it will become quite useful when we discuss JScript .NET classes later. Essentially a type T1 is a subtype of another type T2 if T1’s set of values is a subset of T2’s set of values. A type consisting of the set of all integers might be a subtype of a type consisting of all the numbers, for instance. (This is not how the integers are traditionally construed; the C type system makes integers and floats disjoint types, where the integer 1 and the float 1.0 are different values that happen to compare as equal — but comparisons across types is a subject for a later blog entry.)

Anyway, JScript Classic has six built-in types, all of which are disjoint. They are as follows:

  • The Number type contains all floating-point numbers as well as positive Infinity, negative Infinity and a special Not-a-Number (“NaN”) values. It may seem odd that “Not-a-Number” is a Number but this does in fact make sense. NaN is the value returned when an operation logically must return a number but no actual number makes sense. For example, when trying to convert the string "banana" to a Number, NaN is the result. Because numbers in JScript are actually represented by a 64 bit floating point number there are a finite number of possible Number values. The number of numbers is very large (in fact there are 18437736874454810627 possible numbers, which is just shy of 2^64.) Numbers have approximately fifteen decimal digits of precision and can range from as tiny as 2.2 x 10-308 to as large as 1.7 x 10308.
  • The String type contains all Unicode strings of any length (including zero-length empty strings.) The string type is for all practical purposes infinite, as the length of a string is limited only by the ability of the operating system to allocate enough memory to hold it.
  • The Boolean type has two values: true and false
  • The Null type has one value: null
  • The Undefined type has one value: undefined. All uninitialized JScript variables are automatically set to undefined
  • The Object type has an infinite number of values. An object is essentially a collection of named properties where each property can be a value of any type. In JScript many things are objects: functions, dates, arrays and regular expressions are all objects.

Types are not themselves “first class” objects in JScript, though they are in JScript .NET. I’ll discuss that, along with the differences between prototype and class inheritance, in later entries.


Notes from 2020

It would be interesting to revisit this series in the context of the TypeScript type system, which is a great example of how rich and powerful a gradually typed language can be. The JScript.NET gradual type system was very simple compared to the TS system!

This series got a lot of good comments from readers. A few highlights from this first episode:

  • the typeof operator in JScript bizarrely does not follow the rules of the type system that I just laid out. It identifies null as an object, but does not identify functions as objects, for example. External functions, like DOM functions, may not be identified as functions. It can also return “unknown” in some rare cases.
  • Also confusing: typeof(3) is number, but typeof(new Number(3)) is object.
  • And also confusing, typeof(this) when called from the body of a user-defined function on the prototype of Number is object.
  • A number of inconsistencies and confusions regarding prototypes were also raised; I discuss these in a later episode.
  • In the earliest published version of this article I used the terms “strong” and “weak” with respect to the type system, and readers rightly took me to task for that. This was the beginning of a realization that I have since strongly expressed: “strong” and “weak” are so vague that they are meaningless. I haven’t checked lately, but one time when I read the Wikipedia article on strong typing, it listed eleven different contradictory meanings. I’ve since expressed the (strong!) opinion that “strongly typed” simply means “a type system that I admire”. Instead of characterizing type systems as “strong” or “weak”, instead say what properties they really have: what restrictions do they impose, when are they imposed, and in what ways can those restrictions be violated, and what are the consequences?
  • Readers similarly noted that “untyped” is vague; it is often used to mean “no restriction is placed on the possible values of variables” (as is the case in classic JavaScript) but is also used to mean “no type system is imposed at all on any values” (as is the case in, say, untyped lambda calculus, where all values are function from function to function; there are no integers or strings at all.)

 

Eval is evil, part two

As I promised, more information on why eval is evil. (We once considered having T-shirts printed up that said “Eval is evil!” on one side and “Script happens!” on the other, but the PM’s never managed to tear themselves away from their web browsing long enough to order them.)

Incidentally, a buddy of mine who is one of those senior web developer kinda guys back in Waterloo sent me an email yesterday saying “Hello, my name is Robert and I am an evalaholic”. People, it wasn’t my intention to start a twelve step program, but hey, whatever works!

As I discussed the other day, eval on the client is evil because it leads to sloppy, hard-to-debug-and-maintain programs that consume huge amounts of memory and run unnecessarily slowly even when performing simple tasks. But like I said in my performance rant, if it’s good enough, then hey, it’s good enough.  Maybe you don’t need to write maintainable, efficient code. Seriously! Script is often used to write programs that are used a couple of times and then thrown away, so who cares if they’re slow and inelegant?

But eval on the server is an entirely different beast. First off, server scenarios are generally a lot more performance sensitive than client scenarios.  On a client, once your code runs faster than a human being can notice the lag, there’s usually not much point in making it faster.  But  as I mentioned earlier, ASP goes to a lot of work to ensure that for a given page, the compiler only runs once. An eval defeats this optimization by making the compiler run every time the page runs! On a server, going from 25 ms to 40 ms to serve a page means going from 40 pages a second to 25 pages a second, and that can be expensive in real dollar terms.

But that’s not the most important reason to eschew eval on the server.  Any use of eval (or its VBScript cousins Eval, Execute and ExecuteGlobal) is a potentially enormous security hole:

<%
  var Processor_ProductList;
  var Software_ProductList;
  var HardDisk_ProductList;
  // ...
  CategoryName = Request.QueryString("category");
  ProductList = eval(CategoryName & "_ProductList");
  // ...

What’s wrong with this picture?  The server assumes that the client is not hostile.  Is that a warranted assumption?  Probably not!  You know nothing about the client that sent the request.  Maybe your client page only sends strings like “Processor” and “HardDisk” to the server, but anyone can write their own web page that sends

((new ActiveXObject('Scripting.FileSystemObject')).
DeleteFile('C:*.*',true)); 
Processor

which will cause eval to evaluate

((new ActiveXObject('Scripting.FileSystemObject')).
DeleteFile('C:*.*',true)); 
Processor_ProductList

Obviously that’s a pretty unsophisticated attack.  The attacker can put any code in there that they want, and it will run in the context of the server process.  Hopefully the server process is not a highly privileged one, but still, there’s vast potential for massive harm here just by screwing up the logic on your server.

Never trust the input to a server, and try to never use eval on a server.  Eval injection makes SQL injection look tame!

To try and mitigate these sorts of problems, JScript .NET has some restrictions on its implementation of eval, but that’s a topic for another entry.


Notes from 2020

The attack I’m briefly describing here is of course only one of a great many “hostile client gets bad string onto the server” attack patterns. During my time at Coverity I got to take a deep look at the tools which attempt to detect code paths where a string goes from an untrusted source to a dangerous sink. In terms of the complexity of the analyzed control flows, these were probably the most sophisticated checkers we had, and among the most prone to false positives.

Defeating injection attacks is a hard problem, and I wish we at Microsoft had solved it in the type system, rather than creating a market for expensive third-party solutions that use symbolic execution to effectively do the work of imposing a type system post hoc on an existing program.

A parable

Once upon a time I was in high school. Ah, the halcyon days of my youth. One day I was sitting in class, minding my own business when the teacher said: “Does anyone have a thin metal ruler?”

No answer. Apparently no one had a thin metal ruler.

“No? How about a nail file?”

No answer. Now, I cannot imagine that of all the girls in the class, not one of them had a nail file. But I can well imagine that none of them wanted to share it with a teacher.

“No? Hmm.”

So I piped up: “What do you need a nail file for?”

“I have this big staple in this document that I need to remove.”

Upon which point one of my classmates mentioned that he had a staple remover. Problem solved.

Over and over again I find that script customers (both internal consumers here at Microsoft and third-party developers) frequently ask questions like my teacher. That is, they have a preconceived notion of how the problem is going to be solved, and then ask the necessary questions to implement their preconceived solution. And in many cases this is a pretty good technique! Had someone actually brought a thin metal ruler to class, the problem would have been solved. But by choosing a question that emphasizes the solution over the problem, the questioner loses all ability to leverage the full knowledge of the questionees.

When someone asks me a question about the script technologies I almost always turn right around and ask them why they want to know. I might be able to point them at some tool that better solves their problem. And I might also learn something about what problems people are trying to solve with my tools.

Joel Spolsky once said that people don’t want drills, they want holes. As a drill provider, I’m fascinated to learn what kinds of holes people want to put in what kinds of materials, so to speak. Sometimes people think they want a drill when in fact they want a rotary cutter.


Commentary from 2019:

First off, I misattributed that quotation. “People don’t want to buy a quarter-inch drill, they want a quarter-inch hole.” is a quote from the economist Theodore Levitt. At the time I wrote this, I was sure that I had read about this idea in a Joel On Software article, but if I did, I cannot find it now. Apologies for the error.

Second, I did not know at the time that we have a name for this pattern of “have a problem, get a crazy idea about a solution, ask baffling questions about the crazy idea, rather than stating the problem directly” that we see so often on StackOverflow. It is an “XY problem“, which strikes me as a terrible name.

Third, I am reminded of a story about the time I was helping Morton Twillingate put a roof on his shed. “Hand me the screw driver there b’y,” he said so I handed him a Philips head screwdriver. “Sweet t’underin’ Jaysus b’y, give me the screw driver!” he said, pointing at the hammer in my other hand, “If I’d wanted the screw remover I’d have said so!”

 

Eval is evil, part one

The eval method — which takes a string containing JScript code, compiles it and runs it — is probably the most powerful and most misused method in JScript. There are a few scenarios in which eval is invaluable. For example, when you are building up complex mathematical expressions based on user input, or when you are serializing object state to a string so that it can be stored or transmitted, and reconstituted later.

However, these worthy scenarios make up a tiny percentage of the actual usage of eval. In the majority of cases, eval is used like a sledgehammer swatting a fly — it gets the job done, but with too much power. It’s slow, it’s unwieldy, and tends to magnify the damage when you make a mistake. Please spread the word far and wide: if you are considering using eval then there is probably a better way. Think hard before you use eval.

Let me give you an example of a typical usage.

<span id="myspan1"></span>
<span id="myspan2"></span>
<span id="myspan3"></span>
function setspan(num, text)
{
  eval("myspan" + num + ".innerText = '" + text + "'");
}




Somehow the program is getting its hands on a number, and it wants to map that to a particular span. What’s wrong with this picture?

Well, pretty much everything. This is a horrid way to implement these simple semantics. First off, what if the text contains an apostrophe? Then we’ll generate

myspan1.innerText = 'it ain't what you do, it's the way thacha do it';

Which isn’t legal JScript. Similarly, what if it contains stuff interpretable as escape sequences? OK, let’s fix that up.

eval("myspan" + num).innerText = text;

If you have to use eval, eval as little of the expression as possible, and only do it once. I’ve seen code like this in real live web sites:

if (eval(foo) != null && eval(foo).blah == 123)
  eval(foo).baz = "hello";


Yikes! That calls the compiler three times to compile up the same code! People, eval starts a compiler. Before you use it, ask yourself whether there is a better way to solve this problem than starting up a compiler!

Anyway, our modified solution is much better but still awful. What if num is out of range? What if it isn’t even a number? We could put in checks, but why bother? We need to take a step back here and ask what problem we are trying to solve.

We have a number. We would like to map that number onto an object. How would you solve this problem if you didn’t have eval? This is not a difficult programming problem! Obviously an array is a far better solution:

var spans = new Array(null, myspan1, myspan2, myspan3);
function setspan(num, text)
{
  if (spans[num] != null)
    spans[num].innertext = text;
}

Since JScript has string-indexed associative arrays, this generalizes to far more than just numeric scenarios. Build any map you want. JScript even provides a convenient syntax for maps!

var spans = { 1 : mySpan1, 2 : mySpan2, 12 : mySpan12 };

Let’s compare these two solutions on a number of axes:

Debugability: what is easier to debug, a program that dynamically generates new code at runtime, or a program with a static body of code? What is easier to debug, a program that uses arrays as arrays, or a program that every time it needs to map a number to an object it compiles up a small new program?

Maintainability: What’s easier to maintain, a table or a program that dynamically spits new code?

Speed: which do you think is faster, a program that dereferences an array, or a program that starts a compiler?

Memory: which uses more memory, a program that dereferences an array, or a program that starts a compiler and compiles a new chunk of code every time you need to access an array?

There is absolutely no reason to use eval to solve problems like mapping strings or numbers onto objects. Doing so dramatically lowers the quality of the code on pretty much every imaginable axis.

It gets even worse when you use eval on the server, but that’s another post.


Notes from 2020

This was my first deliberately-multi-episode topic.

There were many great comments on this article on the original blog site; to summarize a few of them:

  • There are a number of scenarios where you want to dynamically create a new function, but “new Function” is the appropriate choice rather than “eval” most of the time.
  • However, the scoping rules for “new Function” and “eval” are different — thanks, JavaScript — and so sometimes there are scenarios where you are forced to eval a new function.
  • I was not then and am not now an expert on the browser’s object model. I have many times noted the irony that as a developer of the JS compiler, I was an expert on the inner workings of the JS compiler, and not on how it was used in practice. A reader pointed out that none of my solutions were good practice compared with the expediency of:
var span = document.all("myspan" + num);
if (span != null) span.innertext = text;

or, equivalently, getElementById, on browsers which supported it at the time.

Functions are not frames

I just realized that on my list of features missing from JScript.NET “fast mode” I forgot about the caller property of functions. In compatibility mode you can say

function foo(){ bar(); }
function bar(){ print(bar.caller); }
foo();

In fast mode this prints null, in compatibility mode it prints function foo(){bar();}.

Eliminating this feature does make it possible to generate faster code — keeping track of the caller of every function at all times adds a fair amount of complexity to the code generation. But just as importantly, this feature is simply incredibly broken by its very design. The problem is that the function object is completely the wrong object to put the caller property upon in the first place. For example:

function foo(x){ bar(x-1); }
function bar(x)
{
  if (x > 0)
    foo(x-1);
  else
  {
    print(bar.caller.toString().substring(9,12));
    print(bar.caller.caller.toString().substring(9,12));
    print(bar.caller.caller.caller.toString().substring(9,12));
    print(bar.caller.caller.caller.caller.toString().substring(9,12));
  }
}

function blah(){ foo(3); }

blah();

 

This silly example is pretty straightforward — the global scope calls blah. blah calls foo(3), which calls bar(2), which calls foo(1), which calls bar(0), which prints out the call stack.

So the call stack at this point should be foo, bar, foo, blah, right? So why does this print out foo, bar, foo, bar?

Because the caller property is a property of the function object and it returns a function object. bar.caller and bar.caller.caller.caller are the same object, so of course they have the same caller property!

Clearly this is completely broken for recursive functions. What about multi-threaded programs, where there may be multiple callers on multiple threads? Do you make the caller property different on different threads?

These problems apply to the arguments property as well. Essentially the problem is that the notion we want to manipulate is activation frame, not function object, but function object is what we’ve got. To implement this feature properly you need to access the stack of activation frames, where an activation frame consists of a function object, an array of arguments, and a caller, where the caller is another activation frame. Now the problem goes away — each activation frame in a recursive, multi-threaded program is unique. To gain access to the frame we’d need to add something like the this keyword — perhaps a frame keyword that would give you the activation frame at the top of the stack.

That’s how I would have designed this feature, but in the real world we’re stuck with being backwards compatible with the original Netscape design. Fortunately, the .NET reflection code lets you walk stack frames yourself if you need to. Though it doesn’t integrate perfectly smoothly with the JScript .NET notion of functions as objects, at least it manipulates frames reasonably well.


Notes from 2020

My then-colleague and partner in mayhem Peter Torr pointed out to my embarrassment that I had completely forgotten that though, yes, the caller property on a function object is completely broken and useless, the caller property on an arguments object is what we want: per frame. He also pointed out that in some versions of JS, the arguments object is writable and actually gives access to the real frame, not a copy of its values! That is, if we have something like

function f(x)
{
  print(x);
  print(arguments[0]);
  danger();
  print(x);
  print(arguments[0]);
}
function danger()
{
  arguments.caller[0] = "goodbye";
} 
f("hello");

then whether or not the value of x is observed to change depends on what version of JavaScript you are using. Rather terrifying.

Global State On Servers Considered Harmful

The other day I noted that extending the built-in objects in JScript .NET is no longer legal in “fast mode”. Of course, this is still legal in “compatibility mode” if you need it, but why did we take it out of fast mode?

As several readers have pointed out, this is actually a kind of compelling feature. It’s nice to be able to add new methods to prototypes:

String.prototype.frobnicate = function(){/* whatever */}
var s1 = "hello";
var s2 = s1.frobnicate();

It would be nice to extend the Math object, or change the implementation of toLocaleString on Date objects, or whatever.

Unfortunately, it also breaks ASP.NET, which is the prime reason we developed fast mode in the first place. Ironically, it is not the additional compiler optimizations that a static object model enables which motivated this change! Rather, it is the compilation model of ASP.NET.

I discussed earlier how ASP uses the script engines — ASP translates the marked-up page into a script, which it compiles once and runs every time the page is served up. ASP.NET’s compilation model is similar, but somewhat different. ASP.NET takes the marked-up page and translates it into a class that extends a standard page class. It compiles the derived class once, and then every time the page is served up it creates a new instance of the class and calls the Render method on the class.

So what’s the difference? The difference is that multiple instances of multiple page classes may be running in the same application domain. In the ASP Classic model, each script engine is an entirely independent entity. In the ASP.NET model, page classes in the same application may run in the same domain, and hence can affect each other. We don’t want them to affect each other though — the information served up by one page should not depend on stuff being served up at the same time by other pages.

I’m sure you see where this is going. Those built-in objects are shared by all instances of all JScript objects in the same application domain. Imagine the chaos if you had a page that said:

String.prototype.username = FetchUserName();
String.prototype.appendUserName = 
  function() { return this + this.username; };
var greeting = "hello";
Response.Write(greeting.appendUserName());

We’ve created a race condition. Multiple instances of the page class running on multiple threads in the same appdomain might all try to change the prototype object at the same time, and the last one is going to win. Suddenly you’ve got pages that serve up the wrong data! That data might be highly sensitive, or the race condition may introduce logical errors in the script processing — errors which will be nigh-impossible to reproduce and debug.

A global writable object model in a multi-threaded appdomain where class instances should not interact is a recipe for disaster, so we made the global object model read-only in this scenario. If you need the convenience of a writable object model, there is always compatibility mode.


Notes from 2020

There were some good questions posted as comments on the original instance of this article, which I will briefly summarize here.

  • Why does fast mode also require use of var for declarations? Is the reasoning the same as for disallowing global modifications?

Yes — enforcing var improves clarity, improves optimizations and prevents accidental fouling of the global namespace.

  • Is JScript .NET being adopted?

At the time, I had no idea. Since the project was cancelled shortly after this blog was written, apparently not. It was very frustrating.

  • How does JScript .NET perform on the server compared to C# and VB.NET?

At the time, in typical realistic line-of-business benchmarks VB.NET and JS.NET were running about 5% slower throughput than C#, and that gap was closing. I have no idea what the figures are like now.

  • Should “fast mode” really be called “ASP.NET mode”?

I take the point, but in general it is a good idea to describe a feature by its characteristics, and not name it after the constituency whose scenarios motivated the feature.

I have many times since made the joke that it would have been just as accurate to name “fast mode” and “compatible mode” as instead “broken mode” and “slow mode”. I think we can be forgiven some editorializing in the choice of names.

How many Microsoft employees does it take to change a lightbulb?

UPDATE: This article was featured in The Best Software Writing I. Thanks Joel!


Joe Bork has written a great article explaining some of the decisions that go into whether a bug is fixed or not. This means that I can cross that one off my list of potential future entries. Thanks Joe!

But while I’m at it, I’d like to expand a little on what Joe said.His comments generalize to more than just bug fixes. A bug fix is one kind of change to the behaviour of the product, and all changes have similar costs and go through a similar process.

Back when I was actually adding features to the script engines on a regular basis, people would send me mail asking me to implement some new feature.Usually the feature was a “one-off” — a feature that solved their particular problem. Like, “I need to call ChangeLightBulbWindowHandleEx, but there is no ActiveX control that does so and you can’t call Win32 APIs directly from script, can you add a ChangeLightBulbWindowHandleEx method to the VBScript built-in functions? It would only be like five lines of code!”

I’d always tell these people the same thing — if it is only five lines of code then go write your own ActiveX object! Because yes, you are absolutely right — it would take me approximately five minutes to add that feature to the VBScript runtime library. But how many Microsoft employees does it actually take to change a lightbulb?

  • One dev to spend five minutes implementing ChangeLightBulbWindowHandleEx.
  • One program manager to write the specification.
  • One localization expert to review the specification for localizability issues.
  • One usability expert to review the specification for accessibility and usability issues.
  • At least one dev, tester and PM to brainstorm security vulnerabilities.
  • One PM to add the security model to the specification.
  • One tester to write the test plan.
  • One test lead to update the test schedule.
  • One tester to write the test cases and add them to the nightly automation.
  • Three or four testers to participate in an ad hoc bug bash.
  • One technical writer to write the documentation.
  • One technical reviewer to proofread the documentation.
  • One copy editor to proofread the documentation.
  • One documentation manager to integrate the new documentation into the existing body of text, update tables of contents, indexes, etc.
  • Twenty-five translators to translate the documentation and error messages into all the languages supported by Windows.The managers for the translators live in Ireland (European languages) and Japan (Asian languages), which are both severely time-shifted from Redmond, so dealing with them can be a fairly complex logistical problem.
  • A team of senior managers to coordinate all these people, write the cheques, and justify the costs to their Vice President.

None of these take very long individually, but they add up, and this is for a simple feature.You’ll note that I haven’t added all the things that Joe talks about, like what if there is a bug in those five lines of code? That initial five minutes of dev time translates into many person-weeks of work and enormous costs, all to save one person a few minutes of whipping up a one-off VB6 control that does what they want.Sorry, but that makes no business sense whatsoever. At Microsoft we try very, very hard to not release half-baked software. Getting software right — by, among other things, ensuring that a legally blind Catalan-speaking Spaniard can easily use the feature without worrying about introducing a new security vulnerability — is rather expensive! But we have to get it right because when we ship a new version of the script engines, hundreds of millions of people will exercise that code, and tens of millions will program against it.

Any new feature which does not serve a large percentage of those users is essentially stealing valuable resources that could be spent implementing features, fixing bugs or looking for security vulnerabilities that DO impact the lives of millions of people.

Further reading:


Notes from 2020

This article generated a lot of interest and feedback; I was very pleased to be included in Best Software Writing 1, and sad that there was never a part two.

Most of the feedback that I got could be summed up as: “You are making an argument for open source”

Absolutely I was not, and I was mystified then, and continue to be mystified now at this comment. If I were making any comment on open source here — which was emphatically not my intention — it would be that the problems of releasing half-baked software are exacerbated by a “drive by contribution” model of open source.

Regardless of whether source code is available or hidden, and regardless of whether a project accepts or rejects contributions from community members, there are design, specification, implementation, testing, documentation and education costs to all code changes, and when we ignore those costs, we can easily make software that is brittle, disorganized, unsupported, unscalable, incompatible, and non-compliant with important real-world considerations such as privacy regulations, accessibility, internationalization, and so on.

JScript Goes All To Pieces

My entry the other day about fast mode in JScript .NET sparked a number of questions which deserve fuller explanations.  I’ll try to get to them in my next couple of blog entries.

For example, when I said that it was no longer legal to redefine a function, I wasn’t really clear on what I meant. JScript .NET still has closures, anonymous functions, and prototype inheritance.  We didn’t remove any of those.  Furthermore, it is very important to emphasize that we implemented compatibility mode so that anyone who does need these features in JScript .NET can still get them – they will pay a performance penalty, but that’s their choice to make.

What I meant was simply that this is now illegal:

function foo() { return 1; }
function foo() { return 2; }

whereas that is perfectly legal in JScript Classic. In JScript Classic this means “discard the first definition”.

Pop quiz: what output does this produce?

function foo(){ alert(1); }
foo();
function foo(){ alert(2); }
foo();

Of course that produces “2” twice, because in JScript Classic, function and variable declarations are always treated as though they came at the top of the block of code, no matter where they are found lexically in the block.

Obviously this is bizarre, makes debugging tricky, and is totally bug-prone.  The earlier definition is completely ignored, and yet it sits there in the source code, confusing maintenance programmers who do not see the redefinition, which might be a thousand lines later.  Thus, it is illegal in JScript .NET.

But we only made this kind of redefinition illegal.  Other kinds of redefinition, like

var foo = function() { return 1; }
print(foo());
foo = function() { return 2; }
print(foo());

continue to work as you’d expect.

So why was this ever legal?  Do language designers get some kind of perverse kick out of larding languages with “gotcha” idioms?  No, actually there was a pretty good reason for these semantics.  Two reasons actually.  The first is our old friend “muddle on through when you get an error”.  However, since this error can be caught at compilation time, this is not a very convincing point.  The more important point is this one:

< script language="JScript" >
function foo(){ alert(1); }
foo();
</ script>
< script language="JScript">
function foo(){ alert(2); }
foo();
</ script>

Aha!  Now we see what’s going on here.  I said “function and variable declarations are always treated as though they came at the top of the block of code”, and here we have two blocks.  the browser will compile and run the first block, and then compile and run the second block, so this really will display “1” and then “2”.  The browser compilation model allows for piecewise execution of scripts. This scenario requires the ability to redefine methods on the fly, so, there you go.

However, ASP does not have a piecewise compilation model, and neither does ASP.NET.  When we designed JScript .NET we removed this feature from fast mode because we knew that most “normal” hosts have all the source code at once and do not ever need to dynamically pull down new chunks from the internet after old chunks have already run.  By disallowing piecewise execution, we can do a lot more optimizations because we know that once you have a function, you’ve got it and no one is going to redefine it later.