I’m not stringing you along, honest

JScript
and VBScript are often used to build large strings full of formatted text, particularly
in ASP. Unfortunately, naïve string concatenations are a major source of performance
problems.

 

Before
I go on, I want to note that it may seem like I am contradicting my earlier post,
by advocating some “tips and tricks” to make string concatenation faster.  Do
not blindly apply these techniques to your programs in the belief that they will magically
make your programs faster!  You always
need to first determine what is fast enough, next determine what is not fast enough,
and THEN try to fix it!

 

JScript
and VBScript use the aptly-named naïve concatenation
algorithm
 when building strings. Consider this silly JScript example:

 

var str
= “”;

for (var
count = 0 ; count < 100 ; ++count)

str
= “1234567890” + str;

 

The
result string has one thousand characters so you would expect that this would copy
one thousand characters into str.
Unfortunately, JScript has no way of knowing ahead of time how big that string is
going to get and naïvely assumes that every concatenation is the last one.

 

On
the first loop str is
zero characters long. The concatenation produces a new ten-character string and copies
the ten-character string into it. So far ten characters
have been copied. On the second time through
the loop we produce a new twenty-character
string and copy two ten-character strings into it. So far 10 + 20 = 30 characters
have been copied.

 

You
see where this is going. On the third time thirty more
characters are copied for a total of sixty,
on the fourth forty more
for a total of one hundred. Already the
string is only forty characters long and we have copied more than twice that many
characters. By the time we get up to the hundredth iteration over
fifty thousand characters have been copied to make that one thousand character string
.
Also there have been ninety-nine temporary strings allocated and immediately thrown
away.

 

Moving
strings around in memory is actually pretty darn fast, though 50000 characters is
rather a lot.  Worse, allocating and releasing
memory is not cheap.  OLE Automation has
a string caching mechanism and the NT heap is pretty good about these sorts of allocations,
but still, large numbers of allocations and frees are going to tax the performance
of the memory manager.

 

If
you’re clever, there are a few ways to make it better.  (However,
like I said before, always make sure you’re spending time applying cleverness in the
right place.)

 

One
technique is to ensure that you are always concatenating small strings to small strings,
large strings to large strings.  Pop quiz:
what’s the difference between these two programs?

 

for (var
count = 0 ; count < 10000 ; ++count)

str
+= “1234567890” + “hello”;

 

and

 

For count
= 1 To 10000

str
= str & “1234567890” & “hello”

Next

 

?  I
once had to debunk a widely distributed web article which claimed that VBScript was
orders of magnitude slower than JScript because the comparison that the author used
was to compare the above two programs.  Though
they produce the same output, the JScript program is MUCH faster.  Why’s
that?  Because this is not an apples-to-apples
comparison.  The VBScript program is equivalent
to this JScript program:

 

for (var
count = 0 ; count < 10000 ; ++count)

str
= (str + “1234567890”) + “hello”;

 

whereas
the JScript program is equivalent to this JScript program

 

for (var
count = 0 ; count < 10000 ; ++count)

str
= str + (“1234567890” + “hello”);

 

See,
the first program does two concatenations of a small string to a large string in one
line, so the entire text of the large string gets moved twice every time through the
loop.  The second program concatenates
two small strings together first, so the small strings move twice but the large string
only moves once per loop.  Hence, the
first program runs about twice as slow.  The
number of allocations remains unchanged, but the number of bytes copied is much lower
in the second.

 

In
hindsight, it might have been smart to add a multi-argument string concatenation opcode
to our internal script interpreter, but the logic actually gets rather complicated
both at parse time and run time.  I still
wonder occasionally how much of a perf improvement we could have teased out by adding
one.  Fortunately, as you’ll see below,
we came up with something better for the ASP case.

 

The
other way to make this faster is to make the number of allocations smaller, which
also has the side effect of not moving the bytes around so much.

 

var str
= “1234567890”; 10

str =
str + str; 20

var str4
= str + str;  40

str =
str4 + str4;  80

str =
str + str;  160

var str32
= str + str;  320

str =
str32 + str32; 640

str =
str + str32 960

str =
str + str4; 1000

 

This
program produces the same result, but with 8 allocations instead of 100, and only
moves 3230 characters instead of 50000+.   However,
this is a rather contrived example — in the real world strings are not usually composed
like this!

 

Now,
those of you who have written programs in languages like C where strings are not first-class
objects know how to solve this problem efficiently.  You
build a buffer that is bigger than the string you want to put in, and fill it up.  That
way the buffer is only allocated once and the only copies are the copies into the
buffer.  If you don’t know ahead of time
how big the buffer is, then a double-when-full strategy is quite optimal — pour stuff
into the buffer until it’s full, and when it fills up, create a new buffer twice as
big.  Copy the old buffer into the new
buffer and continue.  (Incidentally, this
is another example of one of the “no worse than 200% of optimal” strategies that I
was discussing earlier — the amount of used memory is never more than twice the size
of the memory needed, and the number of unnecessarily copied bytes is never more than
twice the size of the final buffer.)

 

Another
strategy that you C programmers probably have used for concatenating many small strings
is to allocate each string a little big, and use the extra space to stash a pointer
to the next string.  That way concatenating
two strings together is as simple as sticking a pointer in a buffer.  When
you’re done all the concatenations, you can figure out the size of the big buffer
you need, and do all the allocations and copies at once.  This
is very efficient, wasting very little space (for the pointers) in common scenarios.

 

Can
you do these sorts of things in script?  Actually,
yes.  Since JScript has automatically
expanding arrays you can implement a quick and dirty string builder by pushing
strings onto an array, and when you’re done, joining
the array into one big string.  In VBScript
it’s not so easy because arrays are fixed-size, but you can still be pretty clever
with fixed size arrays that are redimensioned with a “double when full” strategy.  But
surely there is a better way than these cheap tricks.

 

Well,
in ASP there is.  You know, I used to
see code like this all the time:

 

str =
“<blah>”

str =
str + blah

str =
str + blahblah

str =
str + whatever

 

‘ etc,
etc, etc — the string gets longer and longer, we have some loops, etc.

 

str =
str + “</blah>”

Response.Write str

 

Oh,
the pain.  The Response object
is an efficient string buffer written in C++
.  Don’t
build up big strings, just dump ’em out into the HTML stream directly.  Let
the ASP implementers worry about making it efficient.

 

Hold
on just a minute. Mister Smartypants Lippert
,” I hear you say, “Didn’t
you just tell us last week that eliminating calls to COM objects is usually a better
win than micro-optimizing small stuff like string allocations?

 

Yes,
I did.  But in this case, that advice
doesn’t apply because I know something you don’t know.

 

We
realized that all the work that the ASP implementers did to ensure that the string
buffer was efficient was being overshadowed by the inefficient late-bound call to Response.Write.  So
we special-cased VBScript so that it detects when it is compiling code that contains
a call to Response.Write and
there is a named item in the global namespace called Response that
implements IResponse::Write

Tags JScript Performance Scripting VBScript

Comments (20)

You must be logged in to post a comment.

  1. MatthewRaymond,Mid$(strVariable, 4, 1) = “a”Seeya
  2. Log in to Reply
  3. is not supported in VBScript? It is often used in VB 6 to speed up string concatenations (its not the fastest option available (direct memory copies are faster, for instance), but it is pretty good).
  4. Is there a reason why the Mid statement:
  5. October 20, 2003 at 7:48 pm
  6. Soeren SandmannIs is actually true that the double-when-full strategy uses twice the memory needed? The extra reserved space is never written into, so the operating system never has to touch the pages, hence never have to reserve any physical memory for it, right?
  7. Log in to Reply
  8. October 20, 2003 at 8:48 pm
  9. Eric LippertMatthew:The bizzare lvalue syntax of the mid function was never added to VBScript because we never got around to it, and no one really banged down my door asking for it. I think you’re the first person to mention it to me in about five years. It would have been a fair amount of work to get it working, as the syntax and semantics are so weird, and we never thought it was worth it.Log in to Reply
  10. Soeren: The OS might never have to commit the pages to physical memory, but it certainly has to reserve them in the process heap! So what if it never commits some to physical memory? The heap is still filled up, and possibly the extra unused reserved space will cause another allocation to be fragmented badly or fall across more pages than it otherwise might.
  11. First of all, you’re reading Eric Lippert’s blog, not Raymond Chen’s.
  12. October 20, 2003 at 9:28 pm
  13. Eric LippertI just checked — malloc calls HeapAlloc, which does commit physical memory.
  14. Log in to Reply
  15. October 20, 2003 at 9:31 pm
  16. MatthewEric,Looking at:str = str & “1234567890” & “hello”is there any reason why it doesn’t change it to:(since they are both constants)? I assume it is not worth the effort in most cases? I am trying to think of a scenario in which doing that wouldn’t work (maybe if str was Null – no, that would still work), but I can’t think of any…Log in to Reply
  17. Seeya
  18. str = str & “1234567890hello”
  19. Next
    </code>
  20. <code>
    For count = 1 To 10000
  21. Whoops – sorry for calling you Raymond. 😉
  22. October 20, 2003 at 11:46 pm
  23. Dan ShappirAn anecdote: in a previous lifetime I was involved in the development of a web-based application that implemented its UI in DHTML and JavaScript (the server backend was JSP BTW). One of the team members wrote the code for an HTML-based tree control. This control was generated on the client-side using data downloaded from the server in the form of strings.I suggested one of the following solutions:
    1. Put all the nodes in an array and then use join.
    2. Add nodes to the HTML DOM independently using insertAdjacentHTMLI don’t remember which option he chose, but the result was a huge performance improvement (something like two orders of magnitude).
  24. Log in to Reply
  25. The second method has the advantage that the tree size (as text) can exceed 64K. But it also translates to lots of DOM/COM calls.
  26. The problem was, this tree could contain hundreds of nodes and it was taking extremely long to build. Looking at the code I discovered he was constructing a huge string representing the entire tree and then using innerHTML to render it. The tree nodes were added to this tree using string concatenation.
  27. October 21, 2003 at 8:46 am
  28. Eric LippertMatthew: Constant folding at compile time is possible — JScript .NET does so, but JScript and VBScript do not. I’ll write a blog entry about the details.Log in to Reply
  29. Dan: I was with you right up to the 64K part — is there some 64K limit somewhere that I don’t know about?
  30. October 21, 2003 at 11:15 am
  31. Dan ShappirAs I recall VBScript and JScript strings are implemented using COM BSTRs. Or more precisely, VBScript strings are BSTRs and JScript strings are objects convertible to BSTRs. In any event, when passed as arguments to COM methods (such as the browser DOM methods) they are passed in as BSTRs.Log in to Reply
  32. A BSTR is a data structure that contains both the content of the string, as UNICODE chars, and the length of the string, as a word. Since the length is stored in a word, it is limited to 65535 bytes, 64K. As a result, if the entire HTML block is contained in a single string, this is a limitation on its size.
  33. October 21, 2003 at 3:48 pm
  34. Peter TorrActually it’s a DWORD — up to 2 billion chars (4 billion bytes / 2 bytes per char) of UNICODE text in a BSTRLog in to Reply
  35. http://blogs.gotdotnet.com/ericli/commentview.aspx/853ae05f-7610-4531-ab1b-070695e61168
  36. October 21, 2003 at 5:49 pm
  37. Eric Lippert> Or more precisely, VBScript strings are BSTRs and JScript strings are objects convertible to BSTRs.> Since the length is stored in a word, it is limited to 65535 bytes, 64K. As a result, if the entire HTML block is contained in a single string, this is a limitation on its size.Boy, do I feel like an old-timer now.
  38. Log in to Reply
  39. Good heavens no! That limitation hasn’t been in place since the last build of the Windows 3.1 version of VBScript. No 32-bit build of scripting ever had a limitation on the number of characters in a string. The win3.1 limitation was due to the 16 bit pointer size, not due to the BSTR length field — I’ve just checked the sources, and in fact the 16 bit build of OLE Automation used a DWORD for the length.
  40. No, other way around. JScript strings are implemented as BSTRs which may optionally be wrapped in an object.
  41. October 21, 2003 at 7:30 pm
  42. Dan ShappirLive and learn.
    Must have looked at an old doc and the detail stuck.Log in to Reply
  43. Oh, and since I’ve been programming proffesionally since 89′, I bet I’m older than you.
  44. October 22, 2003 at 4:40 am
  45. Eric LippertWell, I was getting paid to program in 1989, so I guess that makes me a professional — but then again, I was also in eleventh grade… 🙂
  46. Log in to Reply
  47. October 22, 2003 at 12:43 pm
  48. Jay HugardJust to attest to the speedups possible in legacy JScript, I wrote a hex dumper in jscript using string concatination – the dump was built line-by-line and lines were collected into a single string. Modifying the code to use an array, then Join(“n”)’ing it at the end provided an about 75x performance improvement for 64k dumps. Larger dumps took much longer and show a much greater improvement.Log in to Reply
  49. Also interesting: while evaluating string performance, I noticed that “s += a + b + c” is slightly faster than “s.concat(a,b,c)”.
  50. October 24, 2003 at 7:30 pm
  51. Eric Lippert> Larger dumps took much longer and show a much greater improvement.Log in to Reply
  52. The naive algorithm gets four times slower every time the number of concats doubles, but the join algorithm gets only two times slower when you double the number of entries. Therefore, yes, you’re absolutely right — the larger the problem, the better the improvement.
  53. October 25, 2003 at 12:13 pm
  54. Anonymous”Well, I was getting paid to program in 1989, so I guess that makes me a professional — but then again, I was also in eleventh grade… :)”Log in to Reply
  55. Make a blogentry about how you started at MS and so on!
  56. October 25, 2003 at 3:39 pm
  57. GeeMein the example;why don’t you count the building of string (“1234567890” + str) as a copy operation? in each step, str content is concatenated to the  “1234567890” which we have an extra copy of str. so i think that another fifty thousand (99*50*10) character copies performed added to that fifty.
  58. Log in to Reply
  59. var str = “”;

    for (var count = 0 ; count < 100 ; ++count)

    str = “1234567890” + str;

  60. July 15, 2006 at 12:23 pm
  61. JScript BlogHello Friends, I am Jaiprakash and work as a Developer in the Jscript team. There is not much to tell
  62. Log in to Reply
  63. October 17, 2007 at 8:39 am
  64. MSDN Blog Postings » Performance issues with “String Concatenation” in JScript.PingBack from http://msdnrss.thecoderblogs.com/2007/10/17/performance-issues-with-string-concatenation-in-jscript/
  65. Log in to Reply
  66. October 17, 2007 at 9:33 am
  67. Performance issues with “String Concatenation” in JScript. « outaTiMEPingBack from http://outatime.wordpress.com/2007/10/18/performance-issues-with-string-concatenation-in-jscript/
  68. Log in to Reply
  69. October 18, 2007 at 12:19 am
  70. JScript BlogHello Friends, Have you read my post on the String Concatenation issue? If yes, then I can sense your
  71. Log in to Reply
  72. March 19, 2008 at 6:54 am
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s