The JScript Type System Part Eight: The Last Blog Entry About Arrays, I Promise

Recall that I defined a type as consisting of two things: a set of values, and a rule for associating values outside of that set with values inside the set.  In JScript .NET, assigning a value outside of a type to a variable annotated with that type restriction does that coercion if possible

var s : String = 123; // Converts 123 to a String

Similarly, I already discussed what happens when you assign a JScript array to a hard-typed CLR array variable

var sysarr : int[] = [10, 20, 30]; // Create new int[3] and copy

and what happens when you assign a one-dimensional CLR array to a JScript array variable:

var jsarr : Array = sysarr; // Wrap sysarr

But what happens when you assign a hard-typed CLR array to a variable annotated with a different CLR array type?

var intarr : int[] = [10, 20, 30];
var strarr : String[] = intarr;

You might think that this does the string coercion on every element, but in fact this is simply not legal. Rather than creating a copy with every element coerced to the proper type, the compiler simply gives up and says that these are not type compatible. If you find yourself in this situation, then you will simply have to write the code to do the copy for you.  Something like this would work:

function copyarr(source : System.Array) : String[]
{
  var dest : String[] = new String[source.Length];
  for(var index : int in source)
    dest[index] = source.GetValue(index);
  return dest;
}

There are a few notable things about this example. First, notice that this copies a rank-one array of any element type to an array of strings. This is one of the times when it comes in handy to have the System.Array “any hard-typed array” type!

Second, notice that you can use the for-in loop with hard-typed CLR arrays. The for-in loop enumerates all the indices of an array rather than the contents of the array. Since CLR arrays are always indexed by integers the index can be annotated as an int. The loop above is effectively the same as

for (var index : int = 0 ; index < source.Length ; ++index)

but the for-in syntax is less verbose and possibly more clear.

Third, you might recall that GetValue (and SetValue) take an array of indices because the array might be multidimensional. But we’re not passing in an array here.  Fortunately, you can also pass only the index if it is a single-dimensional array.

Generally speaking, hard-typed array types are incompatible with each other. There is an exception to this rule, which I’ll discuss later when I talk about what exactly “subclassing” means in JScript .NET.

A grammatical aside

I just wrote in a comment to my previous entry, “The ability to rate one’s knowledge of a subject accurately is strongly correlated with one’s knowledge.”

Wait a minute.  “One’s”???  Word’s grammar checker didn’t blink at that.  But nor does it blink at “ones”.  According to the OED, “one’s” is the genitive declension of “one”.  Let’s sum up:

Pronoun   Genitive
-----------------
Me        My
You       Your
Us        Our
Him       His
Her       Hers
Them      Their
Thou      Thine
It        Its
One       One's

I always thought that the reason that “its” doesn’t take an apostrophe-s was because the rule “add an apostrophe-s to form a possessive” applied only to noun phrases, not to pronouns (And of course, we all know that apostrophe-s does not itself form a genitive noun — otherwise, in the sentence “The First Lady is the President of America’s wife,” Laura Bush would be associated with America, not President Bush.)

What the heck is going on here?  Surely there is some grammar pedant out there who can justify this.  My faith in English grammar has been sorely tried.


Update from 2023: My erstwhile colleague Mike Pope, who still writes an entertaining blog about English usage, gave some fascinating historical context.


Mike in 2003:

Well, let’s work backward.

In the phrase “The First Lady is the President of America’s wife”, the possessive is applied to the entire phrase: “(the President of America)’s wife.” This is common; here’s a nice example: “The woman I went to school with’s daughter”.

(http://www.chessworks.com/ling/papers/myths/mb003.htm)

FWIW, the ability to add a possessive to a noun phrase and not just to a noun is a comparatively recent development in English: “Until well into Middle English times what Jespersen calls the ‘group genitive’, i.e. ‘[the king of England]’s’ nose did not exist, but the usual type was ‘[the king]’s nose of England’. In Old English the usual structure, before the use of the of-possessive would have been ‘the king’s nose England’s’

http://www.linguistlist.org/issues/5/5-524.html

What’s actually interesting to contemplate is why the hell we have an apostrophe for the possessive at all. Possessive is just the genitive case; as such, it’s a normal noun declension, and has no more need for an apostrophe than the plural does. Nothing is elided with the possessive/genitive. And as noted, pronouns manage without it. German likewise has an -s for the genitive and manages without a possessive marvelously well. So whence the flingin-flangin possessive apostrophe, which does little more these days than confuse and annoy people?


Eric in 2023: I and other commenters pointed out that historically, possessives were formed by adding “es”, and the apostrophe indicates that the “e” has been removed, the same way an apostrophe indicates removal of letters in other contractions. “Its” was originally “ites”.


Mike again:

You’re on the right track with the “e” being elided with an apostrophe – that is indeed the origin of the use of an apostrophe as the indication of the genitive. What seems to have happened is that “ites” got elided, as frequently-used words tend to, but this happened much earlier in the history of English than the elision that happened to all other genitive forms. (Presumably because it was a widely-used word – the workhorses of a language are the ones that tend to get streamlined first, which is why the verb ‘to be’ is highly irregular in most languages.) So the progression looks like this: originally the word was “ites”, then it became “its” at a time when apostrophes were apparently not required on such elisions, and then quite a lot later, we started to elide *all* the genitive forms, but by then it was considered correct to indicate such elisions with an apostrophe.

[…]

The issue of elision of the vowel in genitive -es only partly explains the possessive apostrophe; the -as ending was also used for plural of masculine strong nouns in OE

(nice declension and conjugation chart here: http://www.engl.virginia.edu/OE/courses/handouts/magic.pdf),

which suggests that many noun plurals once had, as they did genitive singular, an unstressed vowel to go with their -s. Granted, it has less to do with how things really were than how they were perceived to be when our not-quite-rational system of orthography was being codified. As I sort of opined earlier, IMO the apostrophe is more trouble than it’s worth for possessives; even educated people are confused about its use, if my email Inbox is any evidence. In historical linguistics, mass confusion about forms is often a prelude to an evolutionary change. 🙂

Six out of ten ain’t bad

Occasionally I interview C++ developers. I’m always interested in how people rate themselves, so I’ll occasionally ask a candidate, “On a scale from one to ten, how do you rate your C++ skills?”

The point of the question is actually not so much to see how good a programmer the candidate is — I’m going to ask a bunch of coding questions to determine that. Rather, it’s sort of a trick question. What I’m actually looking for — what I’m looking for in almost every question I ask — is “how does the candidate handle a situation where there is insufficient information available to successfully solve a problem?” Because lemme tell ya, that’s what every single day is like here on the Visual Studio team: hard technical problems, insufficient data, deal with it!

The question has insufficient data to answer it because we have not established what “ten” is and what “one” is, or for that matter, whether the scale is linear or logarithmic. Does “ten” mean “in the 90th percentile” or “five standard deviations from the mean” or what? Is a “one” someone who knows nothing about C++? Who’s a ten?

Good candidates will clarify the question before they attempt to answer it. Bad candidates will say “oh, I’m a nine, for sure!” without saying whether they are comparing themselves against their “CS360: Algorithmic Design” classmates or Stanley Lippman.

I mention this for two reasons — first of all, my favourite question to ask the “I’m a nine out of ten” people actually came up in a real-life conversation today: OK, smartypants: what happens when a virtual base class destructor calls a virtual method overridden in the derived class? And how would you implement those semantics if you were designing the compiler? (Funny how that almost never comes up in conversation, and yet, as today proved, it actually is useful knowledge in real-world situations.)

The second reason is that ten-out-of-ten C++ guru Stanley Lippmann has started blogging. Getting C++ to work in the CLR environment was a major piece of design work, of a difficulty that makes porting JScript to JScript.NET look like a walk in the park on a summer day.

Compared to Stanley Lippmann, I give myself a six.


Update from 2023:

Two things:

First, a commenter on the original post mentioned an interviewing technique which I immediately adopted. When the candidate says “oh I’m an 8” or whatever, without calibrating the scale, the right follow-up question is: what is something that you found difficult when you were a 6 or 7? Make the candidate calibrate their own scale, and that’s then signal on how they should be able to handle the coding problems which follow.

Second, I was being somewhat tongue-in-cheek when I said that I’d follow up with a trivia question about the specification. Trivia questions are not great interview questions; as I noted in a comment to the original post, what I’m really looking for is not whether the candidate can regurgitate the specification on command, but rather whether they know that compilers are not magical; compilers need to generate code which implements the specification, and there are common techniques for doing so; do you know what they are? It’s all about gaining signal on how productive the candidate could be when solving problems we will actually face on the job.

Speeding can slow you down

I’ve been meaning to talk a bit about some of the performance issues you run into when tuning massively multi-threaded applications, like the ASP engine.  I’d like to start off by saying that I am by no means an expert in this field, but I have picked up a thing or two from the real experts on the ASP perf team over the years.

One of the most intriguing things about tuning multi-threaded applications is that making the code faster can slow it down. How is that possible? It certainly seems counter-intuitive, doesn’t it?

Let me give you an analogy.

Suppose you have an unusual road system in your town.  You have a square grid of roads with stoplights at the intersections. But unlike the real world, these are perfect traffic lights — they are only red if there actually is another car in the intersection. Unlike a normal road, each road goes only one way, and has at most one car on it at a time.  Once a car reaches the end of the road, it disappears and a new car may appear at the start. Furthermore, there is a small number of drivers — typically one or two, but maybe eight or sixteen, but probably not one for every car. The drivers drive a car for a while, then stop it and run to the next car!  The drivers are pretty smart — if their car is stopped at a red stoplight then they’ll run to a stopped car that is not at a red stoplight (if one exists) and drive it for a while.

In our analogy each road represents a thread and each stoplight represents a mutex.  A mutex is a “mutually exclusive” section of code, also known as a “critical section”.  Only one thread can be executing in that code at one time. The car represents the position of the instruction counter in this thread.  When the car reaches the end of the road, the task is finished — the page is served. The drivers represent the processors, which give attention to one thread at a time and then context switch to another thread.  The time spent running from car to car is the time required to perform a thread context switch.

Now imagine that you want to maximize the throughput of this system — the total number of cars that reach the end of their road per hour.  How do you do it?  There are some “obvious” ways to do so:

  • hire more drivers (use more processors)
  • eliminate some stoplights by building overpasses at intersections (eliminate critical sections)
  • buy faster cars (use faster processors)
  • make the roads shorter (create pages that require less code)

You’ll note that each of these is either expensive or difficult.  Perf isn’t easy!

Now, I said that these are “obvious” ways to solve this problem, and those scare quotes were intentional.  Imagine a complex grid of roads with lots of stoplights, a moderate number of cars on the road, and two drivers.  It is quite unlikely that cars will spend a lot of time stopped at stoplights — mostly they’ll just breeze right on through.  But what happens when you throw another six drivers into the mix? Sure, more cars are being driven at any one time, but that means that the likelihood of congestion at stoplights just went up. Even though there are four times as many drivers, the additional time spent at stoplights means that the perf improvement is less than a factor of four. We say that such systems are not scalable.

Or consider a moderately congested freeway system with a whole lot of cars, drivers and intersections.  Now suppose that you keep the cars, drivers and intersections the same, but you shrink the whole system down to half its previous size. You make all the roads shorter, so that instead of having eight stoplights to the mile, now you’ve got twenty. Does total throughput get better or worse?  In a real traffic system, that would probably make things worse, and it can in web servers as well.  The cars spend all their time at intersections waiting for a driver to run over to them. Making code faster sometimes makes performance worse because thread congestion and context switches could be the problem, not total path length.

Similarly, making the cars faster often doesn’t help. In the real world of Seattle traffic, upgrading from my 140-or-so horsepower Miata to a 300 HP BMW isn’t going to get me home any faster.  Getting a faster processor and shorter program only helps if the “freeway” is clear of traffic. Otherwise, you sit there in your souped-up ultimate driving machine going zero at stoplights like everyone else.  Raw power does not scale in a world with lots of critical sections.

When perf tuning servers, use the performance monitor to keep a careful eye on not just pages served per second, but on thread switches per second, processor utilization and so on.  If you cannot saturate the processor and the number of thread switches is extremely high, then what is likely happening is that the drivers are spending way too much time running from car to car and not enough time actually driving. Clearly that’s not good for perf.  Tracking down and eliminating critical sections is often the key to improving perf in these scenarios.

The JScript Type System Part Seven: Yeah, you’ve probably guessed that I wrote the array stuff

A reader asked me to clarify a point made in an earlier entry:

Note that JScript .NET arrays do not have any of the methods or properties of a CLR array. (Strings, by contract, can be used implicitly as either JScript .NET strings or as System.String values, which I’ll talk more about later.) But JScript .NET arrays do not have CLR array fields like Rank, SetValue, and so on.

When you have a string in a JScript .NET program, we allow you to treat it as both a System.String and as a JScript object with the String prototype.  For example:

var s = "   hello   ";
print(s.toUpperCase());
// calls JScript string prototype's toUpperCase
print(s.Trim());
// calls System.String.Trim

Which is it really?  From a theoretical standpoint, it doesn’t really matter — you can use it as either.  From an implementation standpoint, of course we use System.String internally and magic up the prototype instance when we need one — just as in JScript classic all strings are VT_BSTR variants internally and we magic up a wrapper when we need one.  JScript .NET strings and CLR strings really are totally interoperable.

Arrays aren’t quite so seamless.  

When you try to use a JScript .NET array when a CLR array is expected, we create a copy.  But when you go the other way, things are a little different. Rather than producing a copy, using a CLR array as a JScript .NET array “wraps it up”. No copy is made. The operation is therefore efficient and preserves identity. Changes made to a wrapped array are preserved:

function
ChangeArray(arr : Array) : void {
  print(arr[0]); // 10
  arr[0] += 100;
  // JScript .NET methods work just fine
  print(arr.join(":")); // 10:20:30
}

var arr : int[] = [10, 20, 30];
ChangeArray(arr);
print(arr[0]); // 110

The principal rule for treating a CLR array as a JScript .NET array is that it must be single-dimensional. Since all JScript .NET arrays are single-dimensional it makes no sense to wrap up a high-rank CLR array.

Once the array is wrapped up it still has all the restrictions that a normal hard-typed array has. It may not change size, for instance. This means that an attempt to call certain members of the JScript .NET Array prototype on a wrapped array will fail. All calls to push, pop, shift, unshift and concat as well as some calls to splice will change the length of the array and are therefore illegal on wrapped CLR arrays.

Note that you may use the other JScript .NET array prototype methods on any hard-typed array (but not vice versa). You can think of this as implicitly creating a wrapper around the CLR array, much as a wrapper is implicitly created when calling methods on numbers, Booleans or strings:

var arr : int[] = [10, 20, 30];
arr.reverse();    
// You may call JScript .NET methods on hard-typed arrays
print(arr.join(":"));   // 30:20:10

There might be a situation where you do want to make a copy of a CLR array rather than wrapping it. JScript .NET has syntax for this, namely:

var sysarr: int[] = [10, 20, 30];
var jsarr1 : Array = sysarr; 
// create wrapper without copy
var jsarr2 : Array = Array(sysarr); 
// create wrapper without copy
var jsarr3 : Array = new Array(sysarr); 
// not a wrapper; copies contents

In the last case jsarr3 is not a wrapper. It is a real JScript .NET array and may be resized.

Thin to my chagrin

I’m going to take a quick intermission from talking about the type system, but we’ll pick it up again soon.  I’ve been thinking a lot lately about philosophical and practical issues of thin client vs. rich client development.  Thus, I ought to first define what I mean by “thin client” and “rich client”. 

Theory

We used to think of Windows application software as being shrink-wrapped boxes containing monolithic applications which maybe worked together, or maybe were “standalone”, but once you bought the software, it was static — it didn’t run around the internet grabbing more code.  If the application required buff processing power and lots of support libraries in the operating system, well, you needed to have that stuff on the client.  The Windows application model required that the client have a rich feature set.

This is in marked contrast to the traditional Unix model of application software.  In this model the software lives on the server and a bunch of “thin” clients use the server.  The clients may have only a minimal level of memory and processing power — perhaps only the ability to display text on a screen!

The ongoing explosion of massive interconnection via the Internet starting in the 1990’s naturally led software developers to rethink the monolithic application model.  Multi-tiered development was the result — the front end that the user sees is just a thin veneer written in HTML, while the underlying business logic and data storage happens on servers behind the scenes.  The client has to be quite a bit fatter than a Unix “dumb terminal”, but if it can run a web browser, it’s fat enough. 

This was a great idea in many ways.  Multi-tiered development encourages encapsulation and data locality.  Encapsulating the back end means that you can write multiple front-end clients, and shipping the clients around as HTML means that you can automatically update every client to the latest version of the front end.  My job from 1996 to 2001 was to work on the implementation of what became the primary languages used on the front-end tier (JScript) and the web server tier (VBScript).  It was exciting work.

Right now, we’re looking to the future.  We’ve made a good start at letting people develop thin-client multi-tiered applications in Windows, but there is a lot more we can do.  To do so, we need to understand what exactly is goodness.  So let me declare right now Eric Lippert’s Rich Client Manifest

The thin-client multi-tiered approach to software development squanders the richness available on the vast majority of client platforms that I’m interested in.  We must implement tools that allow rich client application developers to attain the benefits of the thin-client multi-tiered model.

That’s the whole point of the .NET runtime and the coming Longhorn API.  The thin client model lets you easily update the client and keeps the business logic on the back tier?  Great — let’s do the same thing in the rich client world, so that developers who want to develop front ends that are more than thin HTML-and-script shells can do so without losing the advantages that HTML-and-script afford. 

Practice

I’ve been thinking about this highfalutin theoretical stuff recently because of some eminently practical concerns.  Many times over the years I’ve had to help out third party developers who have gotten themselves into the worst of both worlds.  A surprisingly large number of people look at the benefits of the thin client model — easy updates (the web), a declarative UI language (HTML), an easy-to-learn and powerful language (JScript) — and decide that this is the ideal environment to develop a rich client application.

That’s a bad idea on so many levels.  Remember, it is called the thin client model for a reason.  I’ve seen people who tried to develop database management systemsin JScript and HTML!  That’s a thorough abuse of the thin client model — in the thin client model, the database logic is done on the backend by a dedicated server that can handle it, written by database professionals in hand-tuned C.  JScript was designed for simple scripts on simple web pages, not large-scale software.

Suppose you were going to design a language for thin client development and a language for rich client development.  What kinds of features would you want to have in each?

For the thin client, you’d want a language that had a very simple, straightforward, learn-as-you-go syntax.  The concept count, the number of concepts you need to understand before you start programming, should be low.  The “hello world” program should be something like

print "hello, world!"

and not

import library System.Output;
public startup class MainClass
{
  public static startup function Main () : void
  {
     System.Output("hello, world!");
  }
};

It should allow novice developers to easily use concepts like variables and functions and loops.  It should have a loose type system that coerces variables to the right types as necessary.  It should be garbage collected.  There must not be a separate compile-and-link step.  The language should support late binding.  The language will typically be used for user interface programming, so it should support event driven programming.  High performance is unimportant — as long as the page doesn’t appear to hang, its fast enough.  It should be very easy to put stuff in global state and access it from all over the program — since the program will likely be small, the lack of locality is relatively unimportant. 

In short, the language should enable rapid development of simple software by relatively unsophisticated programmers through a flexible and dynamic programming model. 

OK, what about the rich-client language?  The language requirements of large-scale software are completely different.  The language must have a rigid type system that catches as many problems as possible before the code is checked in.  There must be a compilation step, so that there is some stage at which you can check for warnings.  It must support modularization, encapsulation, information hiding, abstraction and re-use, so that large teams can work on various interacting components without partying on each other’s implementation details.  The state of the program may involve manipulating scarce and expensive resources — large amounts of memory, kernel objects such as file handles, etc.  Thus the language should allow for fine-grained control over the lifetime of every byte.

Object Oriented Programming in C++ is one language and style that fits this bill, but the concept count of C++ OOP is enormous — pure, virtual, abstract, instance, static, base, pointers, references…  That means that you need sophisticated, highly educated developers.  The processing tasks may be considerable, which means that performance becomes a factor.  Having a complex “hello world” is irrelevant, because no one uses languages like this to write simple programs.

In short, a rich-client language should support large-scale development of complex software by large teams of sophisticated professional programmers through a rigid and statically analyzable programming model.

Complete opposites!  Now, what happens when you try to write a rich client style application using the thin client model? 

Apparent progress will be extremely rapid — we designed JScript for rapid development.  Unfortunately, this rapid development masks serious problemsfestering beneath the surface of apparently working code, problems which will not become apparent until the code is an unperformant mass of bugs. 

Rich client languages like C# force you into a discipline — the standard way to develop in C# is to declare a bunch of classes, decide what their public interfaces shall be, describe their interactions, and implement the private, encapsulated, abstracted implementation details.  That discipline is required if you want your large-scale software to not devolve into an undebuggable mess of global state.  If you can modularize a program, you can design, implement and test it in independent parts.

It is possible to do that in JScript, but the language does not by its very nature lead you to do so.  Rather, it leads you to to favour expedient solutions (call eval!) over well-engineered solutions (use an optimized lookup table).  Everything about JScript was designed to be as dynamic as possible.

Performance is particularly thorny.  Traditional rich-client languages are designed for speed and rigidity.  JScript was designed for comfort and flexibility.  JScript is not fast, and it uses a lot of memory.  Its garbage collector is optimized for hundreds, maybe thousands of outstanding items, not hundreds of thousands or millions.

So what do you do if you’re in the unfortunate position of having a rich client application written in a thin-client language, and you’re running into these issues?

It’s not a good position to be in.

Fixing performance problems after the fact is extremely difficult.  The way to write performant software is to first decide what your performance goals are, and then to MEASURE, MEASURE, MEASURE all the time.  Performance problems on bloated thin clients are usually a result of what I call “frog boiling”.  You throw a frog into a pot of boiling water, it jumps out.  You throw a frog into a pot of cold water and heat it up slowly, and you get frog soup.  That’s what happens — it starts off fine when it is a fifty line prototype, and every day it gets slower and slower and slower… if you don’t measure it every day, you don’t know how bad its getting until it is too late.  The best way to fix performance problems is to never have them in the first place.

Assuming that you’re stuck with it now and you want to make it more usable, what can you do?

  • Data is bad. Manipulate as little data as possible.  That’s what the data tier is for.  If you must manipulate data, keep it simple — use the most basic data structures you can come up with that do the job.
  • Code is worse.  Every time you call eval, performance sucks a little bit more.  Use lookup tables instead of calling eval.  Move code onto the server tier. 
  • Avoid closures.  Don’t nest your functions unless you really understand closure semantics and need them.
  • Do not rely on “tips and tricks” for performance.  People will tell you “declared variables are faster than undeclared variables” and “modulus is slower than bit shift” and all kinds of nonsense.  Ignore them.  That’s like mowing your lawn by going after random blades of grass with nail scissors.  You need to find the WORST thing, and fix it first.  That means measuring.  Get some tools — Visual Studio Analyzer can do some limited script profiling, as can the Numega script profiler, but even just putting some logging into the code that dumps out millisecond timings is a good way to start.  Once you know what the slowest thing is, you can concentrate on modularizing and fixing it.
  • Modularize.  Refactor the code into clean modules with a well-defined interface contract, and test modules independently. 

But the best advice I can give you is simply use the right tool for the right job.  The script languages are fabulous tools for their intended purpose.  So are C# and C++.  But they really are quite awful at doing each other’s jobs!


The JScript Type System, Part Six: Even more on arrays in JScript .NET

You might have noticed something odd about that last example using SetValue. The CLR documentation it notes that the function signature is:

public function SetValue(value : Object, indices : int[]) : void

The indices parameter is typed as taking a .NET array of integers but in the example in my last entry we give it a literal JScript array, not a CLR array.

JScript .NET arrays and CLR arrays work together, but because these two kinds of arrays are so different they do not work together perfectly. The problem is essentially that JScript .NET arrays are much more dynamic than CLR arrays. JScript .NET arrays can change size, can have elements of any type, and so on.

The rules for when JScript .NET arrays and CLR arrays may be used in place of each other are not particularly complicated but still you should exercise caution when doing so. In particular, when you use a JScript .NET array in a context where a CLR array is expected you can get unexpected results. Consider this example:

function ChangeArray(arr : int[]) : void
{
  print(arr[0]); // 10
  arr[0] += 100;
}
var jsarr : Array = new Array(10, 20, 30);
ChangeArray(jsarr);
print(jsarr[0]); // 10 or 110?

This might look like it prints out 10 then 110, but in fact it prints out 10 twice. The compiler is unable to turn the dynamic JScript array into a reference to a CLR array of integers so it does the next best thing. It makes a copy of the JScript array and passes the copy to the function. If the function reads the array then it gets the correct values. If it writes it, then only the copy is updated, not the original.

To warn you about this possibly unintentional consequence of mixing array flavours, the compiler issues the following warning if you do that:

warning JS1215: Converting a JScript Array to a System.Array 
results in a memory allocation and an array copy

You may now be wondering then why the call to SetValue which had the literal JScript .NET array did not prompt this warning. The warning is suppressed for literal arrays. In the case of literal arrays the compiler can determine that a literal array is being assigned to a variable of CLR array type. The compiler then optimizes away the creation of the JScript .NET array and generates code to create and initialize the CLR array directly. Since there is then no performance impact or unexpected copy, there is no need for a warning.

Note that if every element of the source JScript .NET array cannot be converted to the element type of the CLR array then a type mismatch error will result. For instance, this would fail:

var arr1 : int[] = new Array(10, "hello", 20); // Type mismatch error at runtime
var arr2 : int[] = [10, "hello", 20];          // Type mismatch error at compile time

Note also that this applies to multidimensional arrays. There is no syntax for initializing a multidimensional array in JScript .NET:

var mdarr : int[,] = [ [ 1, 2 ], [3, 4] ]; // Nice try, but illegal

A rectangular multidimensional array is not “an array of arrays”. In this case you are assigning a one-dimensional array which happens to contain arrays to a two-dimensional array of integers; that is not a legal assignment. If, however, you want a ragged array it is perfectly legal to do this:

var ragged : int[][] = [ [ 1, 2 ], [3, 4] ];
print(ragged[1][1]); // 4

Rectangular multidimensional arrays are indexed with a comma-separated list inside one set of square brackets. If you use ragged arrays to simulate true multidimensional arrays then the indices each get their own set of brackets.

Note that JScript .NET arrays do not have any of the methods or properties of a CLR array. (Strings, by contract, can be used implicitly as either JScript .NET strings or as System.String values, which I’ll talk more about later.)  But JScript .NET arrays do not have CLR array fields like RankSetValue, and so on.

Next time on FAIC: I’ll talk a bit about going the other way — using a CLR array where a JScript array is expected.


Commentary from 2021:

Unlike the previous episode, this episode solicited some good comments on how BeyondJS / BeyondRhino deal with the very similar problem of making JS arrays interoperate with Java arrays, so that JS scripts could be used to more easily write unit tests for JVM programs. There was a lot of this sort of parallel work going on at the time.

The JScript Type System, Part Five: More On Arrays In JScript .NET

As I was saying the other day, CLR arrays and JScript arrays are totally different beasts. It is hard to imagine two things being so different and yet both called the same thing. Why did the CLR designers and the JScript designers start with the same desire — create an array system — and come up with completely different implementations?

Well, the CLR implementers knew that dense, nonassociative, typed arrays are easy to make fast and efficient. Furthermore, such arrays encourage the programmer to keep homogenous data in strictly bounded tables. That makes large programs that do lots of data manipulation easier to understand. Thus, languages such as C++, C# and Visual Basic have arrays like this, and thus they are the basic built-in array type in the CLR.

Sparse, associative, untyped arrays are not particularly fast but they are far more dynamic and flexible than Visual Basic-style arrays. They make it easy to store heterogeneous data in any table without worrying about picky details such as exactly how big that table is. In other words, they are “scripty”. Languages such as JScript and Perl have arrays like this.

JScript .NET has both very dynamic, scripty arrays and more strict CLR arrays, making it suitable for both rapid development of scripts and programming in the large. But like I said, making these two very different kinds of arrays work well together is not trivial.

JScript .NET supports the creation of multidimensional typed arrays. As with single-dimensional arrays, the array size is not part of the type. To annotate a variable as containing a typed multidimensional array the syntax is to follow the type with brackets containing commas. For example, to annotate a variable as containing a two dimensional array of Strings you would say:

var multiarr : String[,];

The number of commas between the brackets plus one is equal to the rank of the array. (By this definition if there are no commas between the brackets then it is a rank-one array, as we have already seen.)

A multidimensional array is allocated with the new keyword as you might expect:

multiarr = new String[4,5];
multiarr[0,0] = "hello";

Notice that elements of typed arrays are always accessed with a comma-separated list of integer indices. There must always be exactly one index for each dimension in the array. You can’t use the ragged array syntax [0][0].

There are certain situations in which you know that a variable or function argument will refer to a CLR array but you do not actually know the element type or the rank, just that it is an array. Should you find yourself in one of these (rather rare) situations there is a special annotation for a CLR array of unknown type and rank:

var sysarr : System.Array;
sysarr = new String[4,5];
sysarr = new double[10];

As you can see, a variable of type System.Array may hold any CLR array of any type and rank. However, there is a drawback. Variables of type System.Array may not be indexed directly because the rank is not known. This is illegal:

var sysarr : System.Array;
sysarr = new String[4,5];
sysarr[1,2] = "hello";  // ILLEGAL, System.Arrays are not indexable

Rather, to index a System.Array you must call the GetValue and SetValue methods with an array of indices:

var sysarr : System.Array;
sysarr = new String[4,5];
sysarr.SetValue("hello", [1,2]);

The rank and size of a System.Array can be determined with the RankGetLowerBound and GetUpperBound members.

Thinking about this a bit now, I suppose that we could have detected at compile time that a System.Array was being indexed, and constructed the call to the getter/setter appropriately for you, behind the scenes.  But apparently we didn’t.  Oh well.

Next time on FAIC: mixing and matching JScript and CLR arrays.


Commentary from 2021:

As I say rather a lot, design is the art of coming up with compromises, and of course a good compromise leaves everybody mad. I expected more pushback on these decisions; trying to make it all work in a manner that felt both true to JScript and made efficient use of CLR types designed for a statically-typed world led to a lot of compromises to be mad about. But it went over pretty well, insofar as JScript .NET went over anywhere at all.

There was some good discussion in the comments on the original post about how you might write a fold so that it worked on JScript arrays, CLR arrays, multidimensional arrays, and so on, and did so efficiently.

The JScript Type System, Part Four: JScript .NET Arrays

One of the major differences between JScript .NET and JScript Classic is that JScript .NET now supports optional type annotations on variables.  The number of built-in primitive types has also increased dramatically.  JScript .NET adds value types boolean, byte, char, decimal, double, float, int, long, sbyte, short, uint, ulong and ushort.  In addition, JScript .NET integrates its type system with the CLR type system — a string in JScript has all the properties and methods of the string prototype and all the properties and methods of a System.String.  Backwards compatibility and interoperability with the CLR were two very important design criteria.

The primitive types are pretty straightforward though.  Some more interesting stuff happens when we think about how complex types like arrays interoperate between JScript .NET and the CLR. Let’s quickly review the terminology:

sparse array may have “holes” in the valid indices. A sparse array with three elements might have elements 0, 10 and 1000 defined but nothing in between. The opposite of a sparse array is a dense array. In a dense array all the indices between the lowest and highest indices are valid indices. A dense array with elements 0 and 1000 has 1001 elements.

fixed-size> array has a particular valid range of indices. Typically the size is fixed when the array is created and it may not be changed. A variable-sized array does not have any maximum size. Elements may be added or removed at any time.

single-dimensional array maps a single index onto a value. A multi-dimensional array may require any number of indices to fetch a value.

The number of dimensions an array has is called its rank. (Other terms
such as dimensionality or arity are occasionally used but we will stick to rank.)

uniformly-typed array has every element of the same type. An any-typed array may have elements of any type.

An associative array is an array where the indices are strings. A nonassociative array has integer indices.

literal array is a JScript .NET array defined in the actual source code, much as “abcde” is a literal string or 123.4 is a literal number. In JScript .NET a literal array is a comma-separated list of items inside square brackets:

var arr = [10, 20, "hello"];
var item = arr[1]; // 20

JScript arrays are sparse, variable-sized, single-dimensional, any-typed, associative arrays. CLR arrays are the opposite in every way! They are dense, fixed-size, multi-dimensional, uniformly-typed, nonassociative arrays. It is hard to imagine two more different data structures with the same name.  Making them interoperate at all was a pain in the rear, believe me.

This is a pretty big topic, so I think I’ll split it up over a few entries.  Let me talk a bit about annotation and typing, and we’ll pick up where we left off tomorrow.

Traditional JScript arrays are soft-typed; they can store heterogeneous data:

var arr = new Array();
arr[0] = "hello";
arr[1] = 123.456;
arr[2] = new Date();

CLR arrays, on the other hand, are uniformly typed. Every element of a CLR array is the same type as every other element. This difference motivates the type annotation syntaxes for each. In JScript .NET the traditional arrays are annotated with the Array type and CLR arrays are annotated with the type of the element followed by []:

var jsarr : Array = new Array()
jsarr[0] = "hello";
var sysarr : double[] = new double[10];
sysarr[0] = 123.4;

Note that CLR arrays are fixed-size, but the size is not part of the type annotation; sysarr can be a one-dimensional array of double of any size. This is perfectly legal, for example:

var sysarr : double[] = new double[10];
sysarr[0] = 123.4
sysarr = new double[5];

This throws away the old array and replaces it with a new, smaller array. But once a typed array is created it may not be resized.

Tomorrow: true multidimensional arrays in JScript .NET.

—-

Commentary from 2021:

This post was one of the very few over the lifetime of my blog that got no comments. As an introductory post to a rather arcane subject, I suppose that is not particularly surprising. I put a lot of work into arrays in JS.NET and it was not very well documented, so I figured that getting it out in my blog would be generally helpful.