What Everyone Should Know About Character Encoding

goodness Joel wrote this article — that means that I can cross it off of my list
of potential future blog entries!  “urn:schemas-microsoft-com:office:office” />Thanks




the script engines are entirely Unicode inside.  Making
sure that the script source code passed in to the engine is valid UTF-16 is the responsibility
of the host, and as Joel mentions, IE certainly jumps through some hoops to try and
deduce the encoding.  WSH also has heuristics
which try to determine whether the file is UTF-8 or UTF-16, but nothing nearly so
complex as IE.


should mention that in JScript you can use the u0000 syntax to put unicode codepoints
into literal strings.  In VBScript it is a little trickier — you need to use
the CHRW method.


Tags COM Programming Scripting

Comments (4)

You must be logged in to post a comment.

  1. Olav Junker KjærWhat about characters with code-points above 0xFFFF ?
  2. Log in to Reply
  3. October 11, 2003 at 11:54 am
  4. Eric LippertSince JScript is UTF-16 internally, you can use the surrogate pair code units to represent a code point above u-FFFF.
  5. Log in to Reply
  6. October 12, 2003 at 4:01 pm
  7. bryanfor info’s sake: does this apply to jscript.net as well?
  8. Log in to Reply
  9. October 13, 2003 at 4:40 am
  10. Eric LippertYes, JScript .NET is also entirely UTF-16 internally, and is fully backwards compatible with JScript Classic (modulo a few edge cases.)
  11. Log in to Reply
  12. October 13, 2003 at 2:19 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s