ATBG: Why do enumerators avoid a bounds check?

I am back from a week in which I visited England, Holland, Belgium, France and Hungary; this is by far the most countries I’ve been to in so little time. It was great to meet so many customers; more on bicycling around Budapest in a future episode. For today though, I’ve posted on the Coverity Development Testing Blog’s continuing series Ask The Bug Guys a follow up to the previous episode. I’ll discuss why the struct-based list enumerator does not do bounds checking in its implementation of the Current property. Check it out! Thanks to reader Xi Chen for the question. Continue reading

ATBG: Why does my code not crash?

For a change of page, today on the Coverity Development Testing Blog’s continuing series Ask The Bug Guys I’ll talk about mostly C and C++, with a little Java and C# thrown in at the end. I’ll discuss a very common question I see on StackOverflow in the “C” and “C++” tags: “here’s a clearly buggy program that I wrote; why does it not AV / segfault / crash when I run it?” Check it out! Continue reading

ATBG: Why UTF-16?

NOTE: This post was originally a link to my post on the Coverity blog, which has been taken down. An archive of the original article is here.


Today on Ask The Bug Guys we have a language design question from reader Filipe, who asks:

Why does C# use UTF-16 as the default encoding for strings instead of the more compact UTF-8 or the fixed-width UTF-32?

Good question. First off I need to make sure that all readers understand what these different string formats are. Start by reading Joel’s article about character sets if you’re not clear on why there are different string encodings in the first place. I’ll wait.

.
.
.
.

Welcome back.

Now you have some context to understand Filipe’s question. Some Unicode formats are very compact: UTF-8 has one byte per character for the sorts of strings you run into in American programs, and most strings are pretty short even if they contain characters more commonly seen in Europe or Asian locales. However, the downside is that it is difficult to index into a string to find an individual character because the character width is not a fixed number of bytes. Some formats waste a lot of space: UTF-32 uses four bytes per character regardless; a UTF-32 string can be four times larger than the equivalent UTF-8 string, but the character width is constant.

UTF-16, which is the string format that C# uses, appears to be the worst of both worlds. It is not fixed-width; the “surrogate pair” characters require two 16 bit words for one character, most characters require a single 16 bit word. But neither is it compact; a typical UTF-16 string is twice the size of a typical UTF-8 string. Why does C# use this format?

Let’s go back to 1993, when I started at Microsoft as an intern on the Visual Basic team. Windows 95 was still called Chicago. This was well before the Windows operating system had a lot of Unicode support built in, and there were still different versions of Windows for every locale. My job, amongst other things, was to keep the Korean and Japanese Windows machines in the build lab running so that we could test Visual Basic on them.

Speaking of which: the first product at Microsoft that was fully Unicode internally, so that the same code could run on any localized operating system, was Visual Basic; this effort was well underway when I arrived. The program manager for this effort had a sign on his door that said ENGLISH IS JUST ANOTHER LANGUAGE. That is of course a commonplace attitude now but for Microsoft in the early 1990s this was cutting edge. No one at Microsoft had ever attempted to write a single massive executable that worked everywhere in the world. (UPDATE: Long time Microsoftie Larry Osterman has pointed out to me that NT supported UCS-2 in 1991, so I might be misremembering whether or not VB was the first Microsoft product to ship the same executable worldwide. It was certainly among the first.)

The Visual Basic team created a string format called BSTR, for “Basic String”. A BSTR is a length-prefixed UCS-2 string allocated by the BSTR allocator. The decision was that it was better to waste the space and have the fixed width than to use UTF-8, which is more compact but is hard to index into. Compatibility with the aforementioned version of NT was likely also a factor. As the intern who, among other things, was given the vexing task of fixing the bugs in the Windows 3.1 non-Unicode-based DBCS far east string libraries used by Visual Basic, I heartily approved of this choice.

Wait a minute, what on earth is UCS-2? It is a Unicode string consisting of 16 bit words, but without surrogate pairs. UCS-2 is fixed width; there are no characters that consist of two 16 bit words, as there are in UTF-16.

But… how on earth did that work? There are more than two to the sixteen Unicode characters! Well, it was 1993! UTF-16 was not invented until 1996.

So Visual Basic used UCS-2. OLE Automation, the COM technology that lets VB talk to components, of course also used the BSTR format.

Then UTF-16 was invented and is compatible with UCS-2, so “for free” VB and OLE Automation got upgraded to UTF-16 a few years later.

When the .NET runtime was invented a few years after that of course it used length-prefixed UTF-16 strings to be compatible with all the existing COM / Automation / VB code out there.

C# is of course compatible with the .NET runtime.

So there you go: C# uses length-prefixed UTF-16 strings in 2014 because Visual Basic used length-prefixed UCS-2 BSTRs in 1993. Obviously!

So how then does C# deal with the fact that there are strings where some characters take a single 16 bit word and some take two?

It doesn’t. It ignores the problem. Just as it also ignores the problem that it is legal in UTF-16 to have a character and its combining accent marks in two adjacent 16 bit words. And in fact, that’s true in UTF-32 too; you can have UTF-32 characters that take up two 32-bit words because the accent is in one word and the character is in the other; the idea that UTF-32 is fixed-width in general is actually rather suspect.

Strings with surrogate pairs are rare in the line-of-business programs that C# developers typically write, as are combining mark characters. If you have a string that is full of surrogate pairs or combining marks or any other such thing, C# doesn’t care one bit. If you ask for the length of the string you get the number of 16 bit words in the string, not the number of logical characters. If you need to deal with strings measured in terms of logical characters, not 16 bit words, you’ll have to call methods specifically designed to take these cases into account.

By ignoring the problem, C# gets back into the best of both worlds: the string is still reasonably compact at 16 bits per character, and for practical purposes the width is fixed. The price you pay is of course that if you care about the problems that are ignored, the CLR and C# work against you to some extent.

Reordering optimizations

In a previous episode I discussed whether it is a good idea to remove a lock which protects an integer field. I said that without a lock, the read of a field can be moved arbitrarily far backwards in time on a thread, and that this can cause unexpected behaviour in a program. It is perhaps not clear why this can be a problem; it seems like a read shouldn’t have any side effects that can be noticed if it happens earlier. A number of readers asked whether making the read into a volatile read prevented the reordering problem.

The answer to the general question of whether a volatile read can be re-ordered is yes, yes it can. Though a volatile read in a loop cannot be cached once and elided, a volatile read can be moved backwards in time with respect to a volatile write.

The answer to the more specific question of whether making the field volatile makes the program I was given correct cannot really be answered because that was an oversimplified toy program. Deducing the correctness of a real program by analyzing a toy program is perhaps a bad idea.

Instead, I’ll give you an example that (1) can actually happen even on the more restrictive memory model imposed by x86 hardware, and (2) heck let’s make everything volatile while we’re at it; it won’t help! We’ll elide a bunch of locks and see what goes wrong. Check out this program fragment:

static volatile bool q = false;
static volatile bool r = false;
static volatile bool s = false;
static volatile bool t = false;
static object locker = new object();

static bool GetR() { return r; }  // No lock!
static void SetR() { lock(locker) { r = true; } }

static void MethodOne()
{
  q = true;
  if (!GetR())
    s = true;
}

static void MethodTwo()
{
  SetR();
  if (!q)
    t = true;
}

The rest of the program, which I have not shown, behaves as follows. First, the static initializers run normally, so the four Booleans and the monitor are all created and assigned to their given values. Then the program creates two threads. Thread one runs MethodOne and thread two runs MethodTwo. Both threads run to completion and these are the only invocations of the two methods. The question is: can the original thread now observe s and t to both be true?

Give that some thought before you read on.

.
.
.
.
.
.

It would appear that the answer is no, by the following logic.

  • If one thread runs q=true; before the other runs if(!q) then t remains false.
  • Otherwise, one thread runs q=true; after the other runs if(!q). Clearly that implies that if(!GetR()) must have run after SetR();. That implies that !GetR() is false. Therefore s remains false.
  • There are only these two possible cases and in each at least one of s and t remains false, therefore it is impossible that both become true.

This logic is wrong; specifically the second statement is completely bogus and therefore the conclusion does not follow. As is often the case with my puzzles, the portion of the analysis marked “clearly” is the flag that indicates the error.

The lack of a lock on the read of r means that the CPU may re-order the read by moving it arbitrarily far backwards in time even with respect to volatile writes, and an x86 will occasionally do so if the stars align correctly! An x86 will not reorder two reads with respect to each other and will not reorder two writes with respect to each other, but it has no problem reordering a read of one variable with respect to a write of another. The CPU is permitted to move the read backwards in time by pretending that you actually wrote:

static void MethodOne()
{
  bool temp = r;
  q = true;
  if (!temp)
    s = true;
}

Since this optimization is invisible to the code on the current thread, it is legal. But now there is an obvious interleaving in which both s and t become true. First we assign false to temp on one thread, then all of MethodTwo runs on the other thread, so t is true, and then the remainder of MethodOne sets s to true.

Now imagine that your program depends on s and t never being both true, and this situation happens let’s say one time out of every billion executions of this code. If it runs a thousand times a day on a thousand machines that’s one failure in every three years. How on earth would you debug it? You’d be likely to suspect hardware failure rather than a bug in the program, but there assuredly is a bug in the program. These are the nightmare scenarios you run into when you think you’re clever enough to elide locks. The moment you depart even slightly from one of the “blessed” low-lock patterns you are off in the weeds.

The right thing to do here is first, if you can avoid the shared memory situation in the first place, do so. If you cannot, lock everything all the time. In this particular case, locking the read of r or the write of q would likely be sufficient to ensure that s and t are never both true, but I discourage the attitude that walking as close as possible to the edge of a cliff is a great way to be safe. Walk as far away from that cliff as you possibly can! If you’ve got shared memory then put a lock around all usages of it. Simply making a shared variable volatile is insufficient to ensure thread safety in all cases.


Thanks to threading expert Joe Duffy for bringing this scenario to my attention many years ago.

Can I skip the lock when reading an integer?

Today, a question from a Coverity customer:

Here is a greatly simplified version of our code:

public class TestLock
{
  private object threadLock = new object();
  private int value = 0;
  public void Start()
  {
    lock (threadLock) { value = 100; }
  }
  public void Finish()
  {
    lock (threadLock)
    {
      if (value != 0 )
        value = 0;
    }
  }
  public void Increment()
  {
    lock (threadLock)
    {
      if (value != 0) 
        value += 10;
    }
  }
  public int GetValue()
  {
    return value; // No lock!
  }
}

The start, finish and increment operations are all either mutating or non-atomic operations, or both, so they are locked. But the get-value operation is guaranteed by the C# specification to be atomic, so we figure that we can safely remove the lock. Is this correct?

The short answer is: Please do not elide the lock regardless of its correctness. It’s dangerous and the benefit is absurdly tiny compared to the risk.

The slightly longer answer is: The fact that you’re asking the question in the first place means that you don’t have a strong enough understanding of the memory model of C# in order to correctly write low-lock code, so it should be avoided. In fact, I’m not a big fan of multithreaded code in the first place. If you can avoid writing multithreaded code, do so. If you cannot, then use the highest level of abstraction available to you: use tools like the Task Parallel Library to efficiently manage worker threads on your behalf. If you cannot do that, then at least avoid sharing memory across threads. If you cannot do that, then at least lock all the memory that you do share across threads. Writing low-lock code should be the last thing you consider doing, not the first.

The attentive reader will note that neither of those two “answers” actually answered the question. It’s time for the verbose answer!

First off, let’s talk about atomicity.

You are correct that the C# specification promises that reads and writes of 32 bit integer fields are atomic. That is, they cannot be “torn”. A write to a 64 bit integer variable can be treated as two independent writes to two adjacent 32 bit variables, and therefore two threads that are both trying to assign values to those two adjacent variables at the same time can race. It is possible to end up with a value that is half from the first thread and half from the second thread, which is very confusing. The 32, 16, and 8 bit integer types and the bool and char types are all guaranteed to be atomic for individual reads and writes, provided that the developer has not done something silly like force the integer to be misaligned.

You are also correct that more complex operations, such as test-and-set or increment are not atomic, and therefore need to be protected against races.

So the unlocked read will at least be atomic in the sense that we never get a “torn” value out. Now, if we had something like:

    lock (threadLock)
    {
        value += 5;
        DoSomething();
        value += 5;
    }

then the unlocked access is immediately suspicious because it can read a value after the first increment but before the second increment, and that might be unexpected. Let’s assume that you are not in that situation.

Now let’s talk about locks. Locks provide two guarantees in C#.

First, there is the obvious purpose of a lock. The locked object is a monitor; only one thread may have a given monitor locked at a time. A thread which attempts to take a monitor that has been acquired by another thread will block until it can obtain access.

There is a second guarantee of a lock that is more subtle. The C# compiler, JIT compiler and CPU are all permitted to make optimizations provided that the optimization is undetectable by the current thread. In particular, reads and writes of variables may be re-ordered or eliminated entirely provided that doing so is not observable to the current thread. For example, consider the program fragment:

someInteger = someOtherInteger;
someIntegerJustChanged = true;

Given that there is no code between the two statements there is no way for the thread that runs this code to detect what order the mutations happen in. It would be perfectly legal for these two mutations to be re-ordered if the CPU decided that it was more efficient to do so. The current thread cannot tell the difference, but another thread could observe the mutations to happen in the “wrong” order. (I note that on x86-based hardware this particular reordering of writes is actually never observed; those CPUs do not perform this optimization. There is no requirement that all C# programs be run only on x86-based hardware!)

A lock introduces a restriction on these optimizations. In particular: no read may be moved before the beginning of a lock statement, and no write may be moved after the end of a lock statement. Let’s compare these two pieces of code:

  public int GetValue()
  {
    return value; // No lock!
  }

vs

  public int GetValue()
  {
    lock(threadLock) { return value; }
  }

What’s the difference?

In the latter version the CPU may not move the read backwards past the beginning of the lock, and may not move any write that happened before the method was entered to a point after the end of the lock.

In the former, the CPU may as an optimization choose to move the read arbitrarily far “backwards in time” with respect to other reads and writes on this thread provided that doing so is not detectable by the current thread. But that fact could be observed by another thread, and since we know that this program definitely reads memory locations on multiple threads, I am worried. Perhaps some other thread is expecting that the read of this value will be observed to occur in a particular order with respect to the write of some other value. You said that this is a greatly simplified version of the real program, and therefore we cannot reason from the correctness of this program to the correctness of the original program! There are a number of scenarios:

Scenario one: The authors of the code do not know that their program is correct without the lock, and do not have a solid customer-impacting performance reason to elide the lock, but are doing so anyways just for the heck of it. Fortunately for them, by sheer luck their code does not actually have a customer-impacting bug, and equally fortunately, by sheer luck no future edit to the code will introduce such a bug. Also, no future update to the C# compiler, JIT compiler or CPU optimization will ever introduce a bug either.

Scenario two: The original program has a subtle bug, or a bug will be introduced in the future via a code change, compiler update, and so on. Some other thread assumes that the unlocked read will not be moved backwards in time with respect to some other read or write, but in fact this optimization is not only permitted, but will be performed. This fact becomes observable on a thread that then behaves incorrectly as a result. The code therefore has a bug that is difficult to detect and practically impossible to reproduce.

Scenario three: The authors of the original code, who we recall are asking the question about thread safety to begin with, are actually asking a rhetorical question. They fully understand all the ramifications of every possible CPU reordering and have analyzed all those possible reorderings to determine that there is never a bug in their program under any of those conditions. Moreover, they wish to avoid the roughly 10 or 20 nanosecond penalty of the uncontested lock for performance reasons, because that one-hundred-millionth of a second penalty both observable by and unacceptable to their users. And finally, the authors will carefully review every single change ever made to this program in the future with an eye to the possible bugs introduced by reordering optimizations. Avoiding that 10 ns penalty is worth the enormous cost in expert code reviews and the high risk of getting it wrong.

My guess is that the developers are 99% likely to be in scenario one, 0.99% likely to be in scenario two, and 0.01% likely to be in scenario three. I also hazard a guess that the developers have no solid customer-impacting performance reason to elide that lock. Is the expected benefit of a 10 ns savings that no customer will notice really worth the cost, that by chance some impossible-to-detect-and-reproduce bug is in this program? I doubt it. Therefore my recommendation would be to always lock every access to a field that can be shared across threads, even if you think that eliding the lock is 99.99% likely to be harmless. The nano-scale benefit is simply not worth the macro-scale costs of verifying that the elision is correct, and certainly not worth the risk that the code is incorrect now or might become incorrect in the future. Code which avoids locks is extremely difficult to reason about, so make your life easier: if you can, don’t write shared memory code to begin with. If you have to, use locks consistently.


UPDATE: This article was originally posted on the Coverity Dev Testing Blog and has since been deleted; a number of commenters on the original article asked if marking the field volatile magically prevents reordering bugs.

The specific problem with that proposal is that volatile reads and locks do not have the same semantics. A volatile read can be moved backwards in time with respect to a volatile write, and the x86 processor will actually do so, but a read, volatile or otherwise, cannot be moved backwards in time past the beginning of a lock.

The more general problem is: we have a toy example that is greatly simplified from the real code, and therefore we don’t know what invariants the real code relies upon. Trying to deducing whether a real gun is safe by examining a toy gun is a dangerous proposition.

In an upcoming article I will give an example where every variable is volatile, but eliding locks on some of them can produce an unexpected violation of what appears to be a program invariant. 

 

ATBG: inconsistent equality

UPDATE from 2019: This post was originally on Coverity’s “Ask The Bug Guys” developer blog but that was taken down; an archived version is here. The formatting is still a bit messed up; I’ll fix it later.


Today’s episode of Ask The Bug Guys features a C# question from reader Jan:

Hi Bug Guys! We recently had a bug in our code comparing ints and shorts, where calling Equals() produced different results from using the == operator. Also, the behaviour of Equals() was not symmetric. Here’s a code snippet that reproduces the behaviour we observed:

int myInt = 1;
short myShort = 1;
Console.WriteLine(myInt == myShort);      // true
Console.WriteLine(myShort == myInt);      // true
Console.WriteLine(myInt.Equals(myShort)); // true
Console.WriteLine(myShort.Equals(myInt)); // false!

We were quite surprised when we found this. What explains this difference? Is it better to use ==instead of Equals() when comparing integer types?

Hi Jan! Thanks for the great question.

C# was designed to be a “pit of success” language: that is, a language where the obvious technique and the correct technique are the same. And for the most part, that’s true. Unfortunately, equality is one of the areas where there are significant pits of failure, and you’ve fallen right into one of them.

I’m going to add some additional cases to your program to illustrate a number of different equalities.

int myInt = 1;
short myShort = 1;
object objInt1 = myInt;
object objInt2 = myInt;
object objShort = myShort;
Console.WriteLine(myInt == myShort);      // scenario 1 true
Console.WriteLine(myShort == myInt);      // scenario 2 true
Console.WriteLine(myInt.Equals(myShort)); // scenario 3 true
Console.WriteLine(myShort.Equals(myInt)); // scenario 4 false!
Console.WriteLine(objInt1 == objInt1);    // scenario 5 true
Console.WriteLine(objInt1 == objShort);   // scenario 6 false!!
Console.WriteLine(objInt1 == objInt2);    // scenario 7 false!!!
Console.WriteLine(Equals(objInt1, objInt2));// scenario 8 true
Console.WriteLine(Equals(objInt1, objShort));// scenario 9 false!?!

What the heck? As it turns out, we have many different kinds of equality demonstrated here.

In scenarios one and two we must first determine what the == operator means. C# defines over a dozen different built-in == operators:

object == object
string == string
int == int
uint == uint
long == long
ulong == ulong
...

There is no int == short or short == int operators, so the unique best match on the list of built-in operators must be determined. It turns out that the best match is int == int. So the short is converted to int and then the two values are compared as numbers. They are therefore equal.

In scenario three we must first solve an overload resolution problem to determine what Equals means. The receiver is of type int and it has three methods named Equals:

Equals(objectobject// static method from object
Equals(object)         // virtual method from object
Equals(int)            // Implements IEquatable.Equals(int)

The first one we can eliminate because there are not enough arguments. Of the other two, the unique best method is the one that takes an int; it is always better to convert the short argument to intthan to object. Therefore we call Equals(int), which then compares the two integers again using value equality, so this is true.

In scenario four we again must determine what Equals means. The receiver is of type short which again has three methods named Equals

Equals(objectobject// static method from object
Equals(object)         // virtual method from object
Equals(short)          // Implements IEquatable.Equals(short)

Overload resolution eliminates the first because there are too few arguments and eliminates the third because there is no implicit conversion from int to short. That leaves short.Equals(object), which has the moral equivalent of this implementation:

bool Equals(object z)
{
  return is short && (short)z == this;
}

That is, for this method to return true the argument passed in must be a boxed short, and when unboxed it must be equal to the receiver. Since the argument is a boxed int, this returns false. There is no special gear in this implementation that says “well, what if I were to convert myself to the type of the argument and then compare?”

In scenarios five, six and seven operator overload resolution chooses the object == objectform, which is equivalent to a call to Object.ReferenceEquals. Clearly the two references are equal in case five and unequal in cases six and seven. Whether the values of the objects when interpreted as numbers are equal does not come into play at all; only reference equality is relevant.

In scenarios eight and nine operator overload resolution chooses the static method Object.Equals, which you can think of as being implemented like this:

public static bool Equals(object x, object y)
{
    if (ReferenceEquals(x, y)) return true;
    if (ReferenceEquals(x, null)) return false;
    if (ReferenceEquals(y, null)) return false;
    return x.Equals(y);
}

In scenario eight we have two references that are unequal and not null; therefore we call int.Equals(object), which as you would expect from our previous discussion of short.Equals(object) is implemented as the moral equivalent of:

bool Equals(object z)
{
  return is int && (int)z == this;
}

Since the argument is of type int it is unboxed and compared by value. In scenario nine the argument is a boxed short and so the type check fails and this is false.

Summing up: I’ve shown nine different ways that two things can be compared for equality; despite the fact that clearly in every case we have the number one on both sides of the equality, equality is true in only half the cases. If you think this is crazy and confusing, you’re right! Equality is tricky in C#.

I’ve been looking a lot at confusing cases for equality over my last year at Coverity; two of our checkers are specifically designed to find situations where you used the wrong kind of equality. But this article is long enough already and I’ve answered your question (I hope!), so I’ll discuss the specific cases that our checkers look for in another posting.