The JScript Type System, Part Three: If It Walks Like A Duck…

A commenter on part two asked “can you explain the logic that a string is not always a String but a regexp is always a RegExp? What is the recommended way of determining if a value is a string?”

Indeed, the commenter is correct:

print(/foo/ instanceof RegExp);             // true
print(new RegExp("foo") instanceof RegExp); // true
print("bar" instanceof String);             // false
print(new String("bar") instanceof String); // true
print(typeof("bar"));                       // string
print(typeof(new String("bar")));           // object

Why’s that? First off, the question about strings.

In JScript there is this bizarre feature where primitive values — Booleans, strings, numbers — can be “wrapped up” into objects.  Doing so leads to some bizarre situations. The type of a wrapped primitive is always an object type, not a primitive type.  Also, we use object equality, not value equality:

print(new String("bar") == new String("bar")); // false

I strongly recommend against using wrapped primitives.  Why do they exist?  The reasoning has kind of been lost in the mists of time, but one good reason is to make the prototype inheritance system consistent.  If bar is not an object then how is it possible to say

print("bar".toUpperCase());

? From the point of view of the specification, this is just a syntactic sugar for

print((new String("bar")).toUpperCase());

Of course as an implementation detail we do not actually cons up a new object every time you call a property on a primitive value! That would be a performance nightmare.  The runtime engine is smart enough to realize that it has a value and that it ought to pass it as the this object to the appropriate method on String.prototype and everything just kind of works out.

This also explains why it is possible to stick properties onto value types that magically disappear.  When you say

var bar = "bar";
bar.hello = "hello";
print(bar.hello); // nada!

what is happening is logically equivalent to:

var bar = "bar";
(new String(bar)).hello = "hello";
print((new String(bar)).hello); // nada!

The magical temporary object is just that — magical and temporary.  Once you’ve used it, poof, it disappears.

But this magical temporary object does not appear when the typeof or instanceof operators are involved.  The instanceof operator says “hey, this thing isn’t even an object, so it can’t possibly be an instance of anything”.  For both consistency and usability, it would have been nice if "bar" instanceof String logically created a temporary object and hence said yes, it is an instance of String. But for whatever reason, that’s not the specification that the committee came up with.

The question about regular expressions is easily answered now that we know what is going on with strings. The difference between regular expressions and strings is that regular expressions are not primitives. Just because you have the ability to express a regular expression as a literal does not mean that it is a primitive! That thing is always an object, so there is no behaviour difference between the compile-time-literal syntax and the runtime syntax.

The question about how to determine whether something is a string is surprisingly tricky. If typeof returns "string" then obviously it is a string, end of story.  But what if typeof returns "object" — how can you tell if that thing is a wrapped string?

It’s not easy. instanceof String doesn’t tell you whether that thing is a string, it tells you whether String.prototype is on the prototype chain.  There’s nothing stopping you from saying

function MyString() {}
MyString.prototype = String.prototype;
var s = new MyString();
// See part two for why this happens:
print(s.constructor == String);            // true
print(s instanceof String);                // true
print(String.prototype.isPrototypeOf(s));  // true

So now what are you going to do?  JScript is excessively dynamic! You can’t rely on any object being what it says it is. JScript forces people to be operationalists. (Operationalism is the philosophical belief that if it walks like a duck and quacks like a duck, it is a duck.) In the face of the kind of weirdness described above, all you can do is try to use the thing like a string, and if it acts like a string, it s a string.


Commentary from 2020

  • A commenter pointed out that I must be a “Lisp geek” because I used “cons” to mean “allocate”. I am not much of a Lisp programmer but I’m willing to use Lisp jargon to get street cred from genuine Lisp geeks. 🙂 (If you are a casual user of Lisp you might think of cons as the function which pushes an item onto the head of a list, but a better way to think of it is that it is an allocator for a head, tail pair. Such a pair is called a “cons cell” for historical reasons.)
  • The title obviously refers to “duck typing” which usually thought of as the characteristic of a type system where what we care about is the existence of the right “shape”; we don’t care if it is a duck, we care if it is a thing that can quack. What I wanted to illustrate here is that JavaScript carries that concept even farther than you might think. It’s not just “does this thing have the properties of a duck?” It’s that in some situations, there is no by-design way to even get a reliable answer to the question “is this a duck or not?” The JavaScript type system is weird and I hope that anyone building a new type system these days has the good sense to not create a situation where you cannot even reliably tell if a string is a string.