Cannot use parentheses when calling a Sub

Every now and then someone will ask me what the VBScript error message “Cannot use parentheses when calling a Sub” means. I always smile when I hear that question. I tell people that the error means that you cannot use parentheses when calling a Sub — which word didn’t you understand?

Of course, there is a reason why people ask, even though the error message is perfectly straightforward. Usually what happens is someone writes code like this:

Result = MyFunc(MyArg)
MySub(MyArg)

and it works just fine, so they then write

MyOtherSub(MyArg1, MyArg2)

only to get the above error.

Here’s the deal: parentheses mean several different things in VB and hence in VBScript. They mean:

1) Evaluate a subexpression before the rest of the expression: Average = (First + Last) / 2
2) Dereference the index of an array: Item = MyArray(Index)
3) Call a function or subroutine: Limit = UBound(MyArray)
4) Pass an argument which would normally be byref as byval: in Result = MyFunction(Arg1, (Arg2)) , Arg1 is passed by reference, Arg2is passed by value.

That’s confusing enough already. Unfortunately, VB and hence VBScript has some weird rules about when #3 applies. The rules are

3.1) An argument list for a function call with an assignment to the returned value must be surrounded by parens: Result = MyFunc(MyArg)
3.2) An argument list for a subroutine call (or a function call with no assignment) that uses the Call keyword must be surrounded by parens: Call MySub(MyArg)
3.3) If 3.1 and 3.2 do not apply then the list must not be surrounded by parens.

And finally there is the byref rule: arguments are passed by reference when possible but if there are “extra” parens around a variable then the variable is passed by value, not by reference.

Now it should be clear why the statement

MySub(MyArg)

is legal but

MyOtherSub(MyArg1, MyArg2)

is not. The first case appears to be a subroutine call with parens around the argument list, but that would violate rule 3.3. Then why is it legal? In fact it is a subroutine call with no parens around the arg list, but has parens around the first argument! This passes the argument by value. The second case is a clear violation of rule 3.3, and there is no way to make it legal, so we give an error.

These rules are confusing and silly, as the designers of Visual Basic .NET realized. VB.NET does away with this rule, and insists that all function and subroutine calls be surrounded by parens. This means that in VB.NET, the statement MySub(MyArg) has different semantics than it does in VBScript and VB6 — this will pass MyArg by reference in VB.NET but by value in VBScript/VB6. This was one of those cases where strict backwards compatibility and usability were in conflict, and usability won.

Here’s a handy reference guide to what’s legal and what isn’t in VBScript: Suppose x and y are variables, f is a one-argument procedure and g is a two-argument procedure.

' to pass x byref, y byref:
f x
Call f(x)
z = f(x)
g x, y
Call g(x, y)
z = g(x, y)

' to pass x byval, y byref:
f(x)
Call f((x))
z = f((x))
g (x), y
g ((x)), y
Call g((x), y)
z = g((x), y)

The following are syntax errors:

Call f x
z = f x
g(x, y)
Call g x, y
z = g x, y

Ah, VBScript. It just wouldn’t be the same without these quirky gotchas.

Advertisements

Why does JavaScript have rounding errors?

Try this in JScript:

window.alert(9.2 * 100.0);

You might expect to get 920, but in fact you get 919.9999999999999. What the heck is going on here?

Boy, have I ever heard this question a lot.

Well, let me answer that question with another question. Suppose you did a simple division, say

window.alert(1.0 / 3.0);

Would you expect an infinitely large window that said “0.33333333333…” with an infinite number of threes, or would you expect ten or so digits?

Clearly you’d expect it to not fill your computer’s entire memory with threes. But that means that I must ask why are you willing to accept an error of 0.00000000000333333… in the case of dividing one by three but not willing to accept a smaller error of 0.000000000001 in the case of multiplying 9.2 by 100?

The simple fact is that all floating-point arithmetic accumulates tiny errors. In JavaScript and many other languages, arithmetic which results in numbers that cannot be represented by a small number of powers of two will result in these tiny errors.

Let’s look at this case a little closer, where we’re trying to multiply 9.2 by 100.0.

100.0 can be exactly represented as a floating point number because it’s a small integer. But 9.2 can’t be — you can’t represent 46/5 exactly in base two any more than you can represent 1/3 exactly in base ten. So when converting from the source code text “9.2” to the internal binary representation, a tiny error is accrued.

However, it’s not all bad; the 64 bit binary number which represents 9.2 internally is (a) the 64 bit float closest to 9.2, and (b) the algorithm which converts back and forth between strings and binary representation will round-trip — that binary representation will be converted back to “9.2” if you try to convert it to a string.

Now we go and throw a wrench in the works by multiplying by one hundred.

That’s going to lose the last few bits of precision because we just multiplied the accrued error by a factor of one hundred. The accrued error is now large enough that the string which most exactly represents the computed value is NOT “920” but rather “919.9999999999999”.

The ECMAScript specification mandates that JavaScript display as much precision as possible when displaying floating point numbers as strings.

VBScript does not do this. VBScript has heuristics which look for this situation and deliberately break the round-trip property in order to make this look better. In JavaScript you are guaranteed that when you convert back and forth between string and binary representations you lose no data. In VBScript you sometimes lose data; there are some legal floating point values in VBScript which are impossible to represent in strings accurately. In VBScript it is impossible to represent values like 919.9999999999999 precisely because they are automatically rounded!

Ultimately, the reason that this is an issue is because we as human beings see numbers like 9.2, 100.0 and 920 as “special”. If you multiply 3.63874692874 by 4.2984769284 and get a result which is one-billionth wrong no one cares, but when you multiply 9.2 by 100.0 and get a result which is one-billionth wrong everyone yells at me! The computer doesn’t know that 9.2 is more special than 3.63874692874 — it uses the same lossy algorithms for both.

All languages which use double-precision floating point arithmetic have this feature — C++, VBScript, JavaScript, whatever. If you don’t like it then either get in the habit of calling rounding functions, or don’t use floating point arithmetic, use only integer arithmetic.

What’s up with Hungarian Notation?

I mentioned Hungarian Notation in my last post — a topic of ongoing religious controversy among COM developers. Some people swear by it, some people swear about it.

The anti-Hungarian argument usually goes something like this:

What is the point of having these ugly, hard-to-read prefixes in my code which tell me the type? I already know the type because of the declaration! If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code. The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.

For a long time I was mystified by this argument, because that’s not how I use Hungarian at all. Eventually I discovered that there are two completely contradictory philosophical approaches to Hungarian Notation. Unfortunately, each can be considered “definitive”, and the “bad” one is in widespread use.

The one I’ll call “the sensible philosophy” is the one actually espoused by Charles Simonyi in his original article. Here’s a quote from Simonyi’s paper:

The basic idea is to name all quantities by their types. […] the concept of “type” in this context is determined by the set of operations that can be applied to a quantity. The test for type equivalence is simple: could the same set of operations be meaningfully applied to the quantities in questions? If so, the types are thought to be the same. If there are operations that apply to a quantity in exclusion of others, the type of the quantity is different. […] Note that the above definition of type […] is a superset of the more common definition, which takes only the quantity’s representation into account. Naturally, if the representations of x and y are different, there will exist some operations that could be applied to x but not y, or the reverse.

(Emphasis added.)

What Simonyi is saying here is that the point of Hungarian Notation is to extend the concept of “type” to encompass semantic information in addition to storage representation information.

There is another philosophy which I call “the pointless philosophy”. That’s the one espoused by Charles Petzold in “Programming Windows“. On page 51 of the fifth edition he says

Very simply, the variable name begins with a lowercase letter or letters that denote the data type of the variable.  For example […] the i prefix in iCmdShow stands for “integer”.

And that’s all! According to Petzold, Hungarian is for connoting the storage type of the variable.

All of the arguments raised by the anti-Hungarians (with the exception of “its ugly”) are arguments against the pointless philosophy! And I agree with them: that is in fact a pointless interpretation of Hungarian notation which is more trouble than it is worth.

But Simonyi’s original insight is extremely powerful! When I see a piece of code that says

iFoo = iBar + iBlah;

I know that there are a bunch of integers involved, but I don’t know the semantics of any of these. But if I see

cbFoo = cchBar + cbBlah;

then I know that there is a serious bug here! Someone is adding a count of bytes to a count of characters, which will break on any Unicode or DBCS platform. Hungarian is a concise notation for semantics like “count”, “index”, “upper bound”, and other common programming concepts.

In fact, back in 1996 I changed every variable name in the VBScript string library to have its proper Hungarian prefix. I found a considerable number of DBCS and Unicode bugs just by doing that, bugs which would have taken our testers weeks to find by trial and error.

By using the semantic approach rather than the storage approach we eliminate the typical anti-Hungarian arguments, which are actually attacks on the storage approach:

I already know the type because of the declaration!

No, the Hungarian prefix tells you the semantic usage, not the storage type. A cBar is a count of Bars whether the storage is a ushort or a long.

If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code.

Annotate the semantics, not the storage. If you change the semantics of a variable then you need to also change every place it is used!

The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.

But the benefit of knowing that you will never accidentally assign indexes to counts, or add apples to oranges, is worth it in many situations.

There are occasionally arguments made for the storage approach:

I like to know at a glance what the type of a thing is.

I do too, but (1) modern editors will tell you when you mouse over, (2) the semantic information is usually more important to understand than the storage information, and (3) I try to write code where the declaration and the usage of local variables is on the same screen because the function is short.

Finally:

Isn’t this something the compiler should be enforcing for you?

Well, yeah. The whole point of semantic Hungarian is that there is a deficiency in the type system enforced by the compiler that we are making up for in the naming system. If the compiler could enforce that a count of bytes cannot be added to an index of ints, we’d be all set.

There are languages that make it easy to subtype an integer, or to represent “units of measure” naturally, but even in languages where this is possible it’s not common. Programmers have an over-reliance in general on integer and string types. It would be better to have a “phone number” class that contains a single string than to just pass phone numbers around as strings, but hardly anyone does.


Further reading: Joel Spolsky has written a similar article: Making Wrong Code Look Wrong. Check it out!

Eric’s complete guide to BSTR semantics

If you’ve ever done any C++ or C programming that used COM objects, you’ll certainly have seen code like this:

STDMETHODIMP CFoo:Bar(BSTR bstrABC) 
{ ... }

What is this BSTR thing, and how does it differ from WCHAR* ?

Low-level languages like C or C++ allow you great freedom in deciding which patterns of bits are used to represent certain concepts. Unicode strings are an excellent example. The standard way to represent an n-character Unicode string in C++ is as a pointer to a 2 x (n + 1) byte buffer where the first 2 x n bytes are unsigned short integers representing the characters in UTF-16 encoding, and the final two bytes in the buffer are zeros, terminating the string.

For notational convenience we shall take a page from Hungarian notation and call such a beast a PWSZ, short for “Pointer to Wide-character String, Zero-terminated”. As far as the C++ type system is concerned, a PWSZ is an unsigned short *.

COM uses a somewhat different approach to storing string data, an approach which is sufficiently similar to allow good interoperability between code expecting PWSZs and code providing COM strings. Unfortunately they are sufficiently different that the subtle differences can cause nasty bugs if you are not careful and cognizant of those differences.

COM code uses the BSTR to store a Unicode string; it is short for “Basic String”, and so called because this method of storing strings was developed for OLE Automation, which was at the time motivated by the development of the Visual Basic language engine.

From the compiler’s point of view a BSTR is also an unsigned short *. The compiler will not care if you use BSTRs where PWSZs are expected and vice-versa. But that does not mean that you can do so without impunity! There would not be two names for the same thing if they were not in some way different; these two things are different in a number of ways.

In most cases a BSTR may be treated as a PWSZ. In almost no cases may a PWSZ be treated as a BSTR.

Let me list the differences first and then discuss each point in excruciating detail.

  1. A BSTR must have identical semantics for NULL and for "". A PWSZ frequently has different semantics for those.
  2. A BSTR must be allocated and freed with the SysAlloc family of functions. A PWSZ can be an automatic-storage buffer from the stack or allocated with malloc, new, LocalAlloc or any other memory allocator.
  3. A BSTR is of fixed length. A PWSZ may be of any length, limited only by the amount of valid memory in its buffer.
  4. A BSTR always points to the first valid character in the buffer. A PWSZ may be a pointer to the middle or end of a string buffer.
  5. When allocating an n-byte BSTR you have room for n/2 wide characters. When you allocate n bytes for a PWSZ you can store n / 2 – 1 characters because you have to leave room for the zero word at the end.
  6. A BSTR may contain any Unicode data including the zero character. A PWSZ never contains the zero character except as an end-of-string marker. Both a BSTR and a PWSZ always have a zero character after their last valid character, but in a BSTR a valid character may be a zero character.
  7. A BSTR may actually contain an odd number of bytes — it may be used for moving binary data around, though I do not recommend doing so. A PWSZ is almost always an even number of bytes and used only for storing Unicode strings.

Over the years I’ve found and fixed many bugs where the author assumed that a PWSZ could be used as a BSTR or vice-versa and thereby violated one of these differences. Let’s dig in to those differences:

  1. If you write a function which takes an argument of type BSTR then you are required to accept NULL as a valid BSTR and treat it the same as a pointer to a zero-length BSTR. COM uses this convention, as do both Visual Basic and VBScript, so if you want to play well with others you have to obey this convention. If a string variable in VB happens to be an empty string then VB might pass it as NULL or as a zero-length buffer — it is entirely dependent on the internal workings of the VB program.

    That’s not usually the case with PWSZ-based code. Usually NULL is intended to mean “this string value is missing”, not as a synonym for an empty string.

    In COM if you have some datum which could be a valid or could be missing then you should store it in a VARIANT and represent the missing value with VT_NULL rather than interpreting a NULL string as different from an empty string.

  2. BSTRs are always allocated and freed with SysAllocString, SysAllocStringLen, SysFreeString and so on. The underlying memory is cached by the operating system and it is a serious, heap-corrupting error to call free or delete on a BSTR. Similarly it is also an error to allocate a buffer with malloc or new and cast it to a BSTR. Internal operating system code makes assumptions about the layout in memory of a BSTR which you should not attempt to simulate.

    PWSZs on the other hand can be allocated with any allocator or allocated off the stack.

  3. The number of characters in a BSTR is fixed. A ten-byte BSTR contains five UTF-16 words, end of story. Even if those characters are all zeros, it still contains five zero characters. A PWSZ on the other hand can contain fewer characters than its buffer allows:
    WCHAR pwszBuf[101]; 
    pwszBuf[0] = 'X'; 
    pwszBuf[1] = '';  
    

    pwszBuf is a one-character string which may be lengthened to up to a 100 character string or shrunk to a zero-character string.

  4. A BSTR always points to the first valid character in the buffer. This is not legal:
    BSTR bstrName = SysAllocString(L"John Doe"); 
    BSTR bstrLast = &bstrName[5]; // ERROR 
    

    bstrLast is not a legal BSTR. That is perfectly legal with PWSZs though:

    WCHAR * pwszName = L"John Doe"; 
    WCHAR * pwszLast = &pwszName[5]; 
    
  5. See (6).
  6. The reasons for the above restrictions make more sense when you understand how exactly a BSTR is really laid out in memory, and this also explains why allocating an n-character BSTR gives you room for n characters, not n-1 like a PWSZ allocator.

    When you call SysAllocString(L"ABCDE") the operating system actually allocates sixteen bytes. The first four bytes are a 32 bit integer representing the number of valid bytes in the string — initialized to ten in this case. The next ten bytes belong to the caller and are filled in with the data passed in to the allocator. The final two bytes are filled in with zeros. You are then given a pointer to the data, not to the header.

    This immediately explains a few things about BSTRs:

    • The length can be determined immediately. SysStringLen does not have to count bytes looking for a zero character like wcslen does. It just looks at the integer preceding the pointer and gives you that value back.
    • That’s why it is illegal to have a BSTR which points to the middle of another BSTR. The length field would not be before the pointer.
    • A BSTR can be treated as a PWSZ because there is always a trailing zero put there by the allocator. You, the caller, do not have to worry about allocating enough space for the trailing zero. If you need a five-character string, ask for five characters.
    • That’s why a BSTR must be allocated and freed by the SysAlloc functions. Those functions understand all the conventions used behind-the-scenes.
  7. Because a BSTR is of a known number of bytes there is no need for the convention that a zero terminates a string. Therefore zero is a legal value inside a BSTR. This means that BSTRs can contain arbitrary data, including binary images. For this reason BSTRs are often used as a convenient way to marshal binary data around in addition to strings. This means that BSTRs may be, in some odd situations, an odd number of bytes. It is rare, but you should be aware of the possibility. I recommend against this practice, but it still happens.

Whew! To sum up, that should explain why a BSTR may usually be treated as a PWSZ but a PWSZ may not be treated as a BSTR unless it really is one. The only situations in which a BSTR may not be used as a PWSZ are

  • when the BSTR is NULL
  • when the BSTR contains embedded zero characters, because the PWSZ code will think the string is shorter than it really is
  • when the BSTR does not in fact contain a string but rather arbitrary binary data

The only situation in which a PWSZ may be treated as a BSTR are when the PWSZ actually is a BSTR, allocated with the right allocator.

In my own C++ code I avoid misunderstandings by making extremely careful use of Hungarian Notation to keep track of what is pointing to what. Hungarian Notation works best when it captures semantic information about the variables which is obscured by the type signature. I use the following conventions:

  • bstr: a real BSTR
  • pwsz: a pointer to a zero-terminated wide character string buffer
  • psz: a pointer to a zero-terminated narrow character string buffer
  • ch: a character
  • pch: a pointer to a wide character
  • cch: a count of characters
  • b: a byte
  • pb: a pointer to a byte
  • cb: a count of bytes

Fabulous Adventures!

Welcome! I’m Eric Lippert; I design and implement programming language tools. This is my blog where I talk about programming and programming language design, with occasional diversions about my other various interests.

I wrote this blog from 2003 to 2012 on MSDN and I am gradually migrating the content to this site.

The early part of this blog is mostly about JavaScript and VBScript; the latter part is mostly about C#. If that sounds like fun to you, let’s go have some fabulous adventures in coding!