ATBG: Reordering optimizations

Last time on the Coverity Development Testing Blog's continuing series Ask The Bug Guys I discussed whether it was a good idea to remove a lock which protects an integer field. My conclusion was that it is not, because the lock prevents many potentially confusing optimizations. This week I follow up on that episode with an example where eliding locks on volatile reads and writes permits a surprising result.


As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

ATBG: Can I skip the lock when reading an integer?

Today on the Coverity Development Testing Blog's continuing series Ask The Bug Guys, I answer a question that made its way to me from a Coverity customer: is it a good idea to remove a lock which only protects the read of an integer field? After all, that read is guaranteed to be atomic, so it seems safe. As is usually the case, the situation is unexpectedly complicated!1


UPDATE: A number of commenters have asked if marking the field volatile magically prevents reordering bugs. The specific problem with that proposal is that volatile reads and locks do not have the same semantics. A volatile read can be moved backwards in time with respect to a volatile write, and the x86 processor will actually do so, but a read, volatile or otherwise, cannot be moved backwards in time past the beginning of a lock. The more general problem is: we have a toy example that is greatly simplified from the real code, and therefore we don’t know what invariants the real code relies upon. Trying to deducing whether a real gun is safe by examining a toy gun is a dangerous proposition.

I have posted a follow-up article to address some of these concerns.


As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. Is "usually unexpected" an oxymoron?

ATBG: How do exceptions interact with the "using" statement?

Today on the Coverity Development Testing Blog's continuing series Ask The Bug Guys, I answer a question I get quite frequently: what guarantees do we have about object disposal when the body of a using block is interrupted by an exception? The situation is rather complicated, it turns out.

As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

ATBG: How does a lock work?

Today on the Coverity Development Testing Blog's continuing series Ask The Bug Guys, I answer a question I've gotten a fair number of times recently: how is it possible to implement locking without already having locking? 

The answer I give is chock full of lies, misinformation and oversimplification, but hopefully it will get across the basic concept that you can build all the threading mechanisms out of simple parts like atomic-test-and-set.

In related news, here are a couple of other recent posts to ATBG. First, my colleague Tim explains that Java's hashCode has many of the same design guidelines as the equivalent method in C# that I discussed here. And my colleague Jon discusses the perils of accidentally performing overlapping reads and writes to the same buffer in C. 1

As always, if you have questions about a bug you've found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. My favourite quotation on the subject of poor handling for overlapping buffers has to be "there has yet to be an example of an application whose impressive performance can be credited to the absence of overlap-detection code in memcpy". True that! I note also that the C# equivalent Array.Copy method is documented as dealing correctly with overlapping reads and writes to the same array.

Construction destruction

Take a look at this little program skeleton:

class Foo 
{ 
    private int x;
    private int y;
    public Foo(int x, int y) 
    {
        this.x = x;
        this.y = y;
        SideEffects.Alpha(); // Notice: does not use "this"
    }
    ~Foo() 
    { 
        SideEffects.Charlie(); 
    }
}
static class SideEffects
{
    public static void Alpha() { ... }
    public static void Bravo() { ... }
    public static void Charlie() { ... }
    public static void M()
    {
        Foo foo = new Foo(1, 2); 
        Bravo();
    }
}

Let's suppose we have three side effects: Alpha, Bravo and Charlie. What precisely they are is not important.

The question is: what do we know about the order in which Alpha, Bravo and Charlie execute when a call to M() occurs?
Continue reading

The no-lock deadlock

People sometimes ask me if there is a cheap-and-easy way to guarantee thread safety. For example, "if my method only reads and writes local variables and parameters, can I guarantee that my method is threadsafe?" Questions like that are dangerous because they are predicated on an incorrect assumption: that if every method of a program is "threadsafe", whatever that means, then the entire program is "threadsafe". I might not be entirely clear on what "threadsafe" means, but I do know one thing about it: thread safety is a property of entire programs, not of individual methods.

To illustrate why these sorts of questions are non-starters, today I present to you the world's simplest deadlocking C# program:1

class C
{
  static C() 
  {
    // Let's run the initialization on another thread!
    var thread = new System.Threading.Thread(Initialize);
    thread.Start();
    thread.Join();
  }
  static void Initialize() { }
  static void Main() { }
}

At first glance clearly ever method of this incredibly simple program is "threadsafe". There is only a single variable anywhere in the program; it is local, is written once, is written before it is read, is read from the same thread it was written on, and is guaranteed to be atomic. There are apparently no locks anywhere in the program, and so there are no lock ordering inversions. Two of the three methods are empty. And yet this program deadlocks with 100% certainty; the program "globally" is clearly not threadsafe, despite all those nice "local" properties. You can build a hollow house out of solid bricks; so too you can build a deadlocking program out of threadsafe methods.

The reason why this deadlocks is a consequence of the rules for static constructors in C#; the important rule is that a static constructor runs exactly zero or one times, and runs before a static method call or instance creation in its type. Therefore the static constructor of C must run to completion before Main starts. The CLR notes that C's static constructor is "in flight" on the main thread and calls it. The static constructor then starts up a new thread. When that thread starts, the CLR sees that a static method is about to be called on a type whose static constructor is "in flight" another thread. It immediately blocks the new thread so that the Initialize method will not start until the main thread finishes running the class constructor. The main thread blocks itself waiting for the new thread to complete, and now we have two threads each waiting for the other to complete.


Next time on FAIC: We're opening up the new Coverity office in Seattle! After which, we'll take a closer look at the uses and abuses of the static constructor.

  1. Thanks to my erstwhile colleague Neal Gafter for this example.

Atomicity, volatility and immutability are different, part three

So what does the keyword "volatile" mean, anyway? Misinformation abounds on this subject.

First off, so as to not bury the lead: in C# the rules have been carefully designed so that every volatile field read and write is also atomic. (Of course the converse does not follow; it is perfectly legal for an operation to be atomic without it being volatile, whatever that means.)

The way this is achieved is simple; the rules of C# only permit you to annotate fields with volatile if the field has a type that is guaranteed to have atomic reads and writes.

There is no logical requirement for this property; logically, volatility and atomicity are orthogonal. One could have a volatile-but-not-atomic read, for example. The very idea of doing so gives me the heebie-jeebies! Getting an up-to-date value that has been possibly splinched through the middle seems like a horrid prospect. I am very glad that C# has ensured that any volatile read or write is also an atomic read or write.

Volatility and immutability are essentially opposites; as we'll see the whole point of volatility is to impose some kind of safety upon certain dangerous kinds of mutability.

But what does volatile mean anyway? To understand this we must first go back in time to the early days of the C language. Suppose you are writing a device driver in C for a temperature-recording device at a weather station:

int* currentBuf = bufferStart;
while(currentBuf < bufferEnd)
{
  int temperature = deviceRegister[0];
  *currentBuf = temperature;
  currentBuf++;
}

It is entirely possible that the optimizer reasons as follows: we know that bufferStart, bufferEnd and deviceRegister are initialized at the beginning of the program and never changed afterwards; they can be treated as constants. We know that the memory addresses mapped to deviceRegister do not overlap the buffer; there is no aliasing going on here. We see that there are never any writes whatsoever in this program to deviceRegister[0]. Therefore the optimizer can pretend that you wrote:

int* currentBuf = bufferStart;
int temperature = deviceRegister[0];
while(currentBuf < bufferEnd)
{
  *currentBuf = temperature;
  currentBuf++;
}

which obviously is completely different in our scenario. The optimizer makes the seemingly-reasonable assumption that if it can prove that a variable is never written to again then it need only read from it once. But if the variable in question is marked as volatile -- meaning it changes on its own, outside of the control of the program -- then the compiler cannot safely make this optimization.

That's what volatile is for in C. Any additional meanings of volatile are not required by the C standard, and are compiler-specific extensions.1

So now let's make an analogy.

Imagine that there is a three-ring binder with a thousand pages in it, called The Big Book of Memory.  Each page has a thousand numbers on it, written in pencil so that they can be changed. You also have a "register page" which only has a dozen numbers on it, each with special meaning. When you need to do some operation on a number, first you flip to the right page, then you look up the right number on that page, then you copy it into the register page. You do your calculations only on the register page. When you are done a calculation you might write a number back somewhere into the book, or you might do another read from the book to get another value to operate on.

Suppose you are doing some operation in a loop -- as with our temperature example above. You might decide as an optimization that you are pretty sure that a particular number location is never going to change. So instead of reading it out of the book every time you need it, you copy it onto your register page, and never read it again. That's the optimization we proposed above. If one of those numbers is changing constantly based on factors outside your control then making this optimization is not valid. You need to go back to the book every time.

You'll note that nothing in our C-style volatile story so far said anything at all about multithreading. C-style volatile is about telling the compiler to turn off optimizations because the compiler cannot make reasonable assumptions about whether a given variable is changing or not. It is not about making things threadsafe. Let's see why!2

Suppose you have one thread that is writing a variable and another thread that is reading the same variable. You might think that this is exactly the same as our "C-style volatile" scenario. Imagine, for example, that our deviceRegister[0] expression above was not reading from some hardware register changing based on outside factors, but rather was simply reading from a memory address that was potentially changing based on the operation of another thread that is feeding it temperature data from some other source. Does "C-style volatile" solve our threading problem?

Kinda, sorta... well, no. The assumption we've just made is that a memory address being changed by the temperature outside is logically the same as a memory address being changed by another threadThat assumption is not warranted in general. Let's continue with our analogy to see why that is by first considering a model in which it is warranted.

Suppose instead of a number being updated by magic because it is somehow reflecting the temperature, suppose instead we have two people taking turns using the book. One is the Reader and one is the Writer. They only have one book and one register page between them, so they have to cooperate.

The Reader again might be reading the same book number slot over and over again. The Reader might decide that they can read the value just once, copy it to the register page, and then keep on using the copy. The Reader does this for a while. When it is time to give the Writer a turn, the Reader writes all of the current register page values to a special page in the book that is reserved for the Reader's use. The Reader then hands the book and the register page to the Writer.

The Writer fills in the register page from their personal special location in the book, and keeps on doing what they were doing, which is writing new values into the book. When the Writer wants to take a break, again, they write the contents of the register page into the book and hand the register page back to the Reader.

The problem should be obvious. If the Writer is writing to the location that the Reader previously cached to their register page then the Reader is making decisions based on an out-of-date value.

If this analogy is a good description of the memory model of the processor then marking the shared location as "C-style volatile" does the right thing. The Reader knows that they should not be caching the value; to get the up-to-date value they have to go back to the book every time because they don't know whether the Writer changed the value when the Writer last had control.3

Unfortunately, that is not the actual memory model of many modern multi-processor machines. The actual memory model goes more like this:

Suppose there are two people sharing one book -- again, the Reader and the Writer. Each has their own register page,plus a blank book page. The Reader goes to read a value from the book. But the book isn't there! Instead, there is the Librarian. The Reader asks the Librarian for the book and the Librarian says "you can't have the book itself; it is far too valuable to let you use it. But give me your blank page, and I'll copy the stuff from the book onto it". The Reader figures that is better than nothing. In fact, it's really good, now that we consider it! The Librarian hands the Reader back a copy of the entire page, not just the single number the Reader wanted. Now the Reader can now go to town and really efficiently do calculations involving any number in that page without talking to the Librarian again. Only when the Reader wants something outside of the bounds of the copied page do they have to go back to the Librarian. The Reader's performance just went way up.

Similarly, the Writer goes to write a value in the book at the same time. (Remember, the Reader and Writer no longer have to take turns using the register page because they both have their own register page.) But the Librarian does not allow this. The Librarian says "here, let me make you a copy of the page you want to write to. You make your changes to the copy, and when you are done, let me know and I will update the entire page." The Writer thinks this is great! The Writer can write all kinds of crazy things and never talk to the Librarian again until the Writer needs to write to (or read from) a different page. When that happens the Writer hands the modified copy page to the Librarian, the Librarian copies the Writer's page back into the book, and gives the Writer a copy of the new page that the Writer wants.

Clearly this is awesome if the Reader and the Writer are not both reading and writing the same page of the book. But what if they are? C-style volatile does not help at all in this situation! Suppose the Reader decides, oh, this memory location is marked as volatile, so I will not cache the read of the value onto my register page. Does that help? Not a bit! Even if the reader always goes back to the page, they are going back to their copy of the page, the copy made for them by the Librarian. Suppose the Reader then says, "OK, this thing is volatile, so when I read it, heck, I'll just go back to the Librarian again and have the Librarian make me a new copy of this page". Does that help? No, because the Writer might not have submitted the changes to the Librarian yet! The Writer has been making changes to their local copy.

In order to solve this problem the Reader could have a way to tell the Librarian "Hey, Librarian! I need to read the most up-to-date version of this location from the Book". The Librarian then has to go find the Writer and ask the Writer to stop what they are doing and submit the changes right now. Both the Reader and the Writer come to a screeching halt and the Librarian then does the laborious work of ensuring that the Book is consistent.4 Alternatively, the Writer could tell the Librarian "hey, I'm about to update this value; go find anyone who is about to read it and tell them that they need to fetch a new copy of this page when I'm done". The exact strategy chosen doesn't really matter for the purposes of this analogy; the point is that everyone has to somehow cooperate to make sure that a consistent view of all the edits is achieved.

This strategy gives a massive performance increases in the common scenario where multiple readers and multiple writers are each working on data that is highly contiguous -- that is, each reader and each writer does almost all of their work on the one page they have copied locally, so that they don't have to go back to the Librarian. It gives massive performance penalties in scenarios where readers and writers are working on the same page and cannot tolerate inconsistencies or out-of-date values; the readers and writers are constantly going back to the Librarian, stopping everybody from doing work, and spending all their time copying stuff back into and out of the Book of Memory to ensure that the local caches are consistent.

Clearly we have a problem here. If C-style volatile doesn't solve this problem, what does solve this problem? C#-style volatile, that's what.

Sorta. Kinda. In a pretty bogus way, actually.

In C#, volatile means not only "make sure that the compiler and the jitter do not perform any code reordering or register caching optimizations on this variable". It also means "tell the processors to do whatever it is they need to do to ensure that I am reading the latest value, even if that means halting other processors and making them synchronize main memory with their caches".

Actually, that last bit is a lie. The true semantics of volatile reads and writes are considerably more complex than I've outlined here; in fact they do not actually guarantee that every processor stops what it is doing and updates caches to/from main memory. Rather, they provide weaker guarantees about how memory accesses before and after reads and writes may be observed to be ordered with respect to each other. Certain operations such as creating a new thread, entering a lock, or using one of the Interlocked family of methods introduce stronger guarantees about observation of ordering. If you want more details, read sections 3.10 and 10.5.3 of the C# 4.0 specification.

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place. Locks guarantee that memory read or modified inside the lock is observed to be consistent, locks guarantee that only one thread accesses a given hunk of memory at a time, and so on. The number of situations in which a lock is too slow is very small, and the probability that you are going to get the code wrong because you don't understand the exact memory model is very large. I don't attempt to write any low-lock code except for the most trivial usages of Interlocked operations. I leave the usage of volatile to real experts.

For more information on this incredibly complex topic, see:

Why C-style volatile is almost useless for multi-threaded programming

Joe Duffy on why attempting to 'fix' volatile in C# is a waste of time

Vance Morrison on incoherent caches and other aspects of modern memory models

  1. And there are a few other standardized valid usages as well; it also prevents optimizations that would screw up non-local gotos and some other relatively obscure scenarios.
  2. Of course we already know one reason: volatile operations are not guaranteed to be atomic, and thread safety requires atomicity. But there is a deeper reason why C-style volatility does not create thread safety.
  3. This is particularly true in non-cooperative multithreading; perhaps the Reader and the Writer do not choose for themselves the schedule for taking turns!
  4. And of course we haven't even considered situations where there are multiple readers and multiple writers all partying on the same page.

Atomicity, volatility and immutability are different, part two

Last time we established that an "atomic" read or write of a variable means that in multithreaded scenarios, you never end up with "halfway mutated" values in the variable. The variable goes from unmutated to mutated directly, with no intervening state. I also talked a bit about how making fields of a struct readonly has no effect on atomicity; when the struct is copied around, it may be copied around four bytes at a time regardless of whether its fields are marked as readonly or not.

There is a larger problem though with reasoning about readonly fields in a struct beyond their non-atomicity. Yes, when you read from readonly fields in a struct on multiple threads without any locking, you can get inconsistent results due to race conditions. But the situation is actually worse than that; readonly fields need not give you results that you think are consistent even on one thread! Basically, readonly fields in a struct are the moral equivalent of the struct author writing a cheque without having the funds to back it.

I originally meant to write a lengthy article about this fact, but then I discovered that Joe Duffy already had done a great job. Check out his article here.

Briefly summing up: since a struct does not "own" its storage, a readonly on a field of a struct only means that the compiler will prevent the author of the code from writing directly to the field outside of a constructor. The actual variable that the struct is borrowing storage from need not be readonly and is therefore free to change as the owner of that variable sees fit. Consider for example:

struct S
{
  readonly int x;
  public S(int x) { this.x = x; }
  public void M(ref S other)
  {
    int oldX = this.x;
    other = new S(456);
    int newX = this.x;
  }
}

Since this.x is a readonly field you might think that newX and oldX always have the same value. But if you say:

S s = new S(123);
s.M(ref s);

then this and other are both aliases for s, and s is free to change!

Returning now to atomicity, I said last time that I'd discuss ways in which you can get more or less atomicity than the C# language guarantees. Basically the C# language guarantees that reads and writes of references are atomic, and reads and writes of built-in value types of four bytes or smaller (int, uint, float, char, bool, and so on) are guaranteed to be atomic, and that's it.

The CLI specification actually makes stronger guarantees. The CLI guarantees that reads and writes of variables of value types that are the size (or smaller) of the processor's natural pointer size are atomic; if you are running C# code on a 64 bit operating system in a 64 bit version of the CLR then reads and writes of 64 bit doubles and long integers are also guaranteed to be atomic. The C# language does not guarantee that, but the runtime spec does. (If you are running C# code in some environment that is not implemented by some implementation of the CLI then of course you cannot rely upon that guarantee; contact the vendor who sold you the runtime if you want to know what guarantees they provide.)

Another subtle point about atomic access is that the underlying processor only guarantees atomicity when the variable being read or written is associated with storage that is aligned to the right location in memory. Ultimately the variable will be implemented as a pointer to memory somewhere. On a 32 bit operating system, that pointer has to be evenly divisible by 4 in order for the read or write to be guaranteed to be atomic, and on a 64 bit operating system it has to be evenly divisible by 8. If you do something crazy like:

int* pInt1 = &intArr[0];
byte* pByte = (byte*)pInt;
pByte += 6;
int* pInt2 = (int*) pByte;
*pInt2 = 0x000DECAF;

then you are not guaranteed that the write which writes half the bytes to intArr[1] and the other half to intArr[2] is atomic. The underlying two array slots may be observed to be updated one after the other.

Now, the CLR does guarantee that all fields of all structs that are of the natural pointer size (or smaller) will by default be aligned in memory such that they do not cross a boundary like this, and therefore reads and writes of fields will be atomic. However, the C# language does permit you to tell the CLR "do not use the default structure packing rules; I will tell you how to lay out the bytes in a structure". (Perhaps because you have a byte buffer from some unmanaged code and you're using unsafe code to get a pointer to a struct that puts some structure onto those bytes.) If you tell the CLR to put an int field inside the structure such that it falls across an alignment boundary then the access to that field will not be atomic, and that's your problem to deal with.

Next time on FAIC: what is volatile, and how does it relate to atomicity in C#? The next episode will be somewhat delayed as I am in California this week attending yet another wedding. It seems like half my friends decided to get married in a row this year! The wedding I officiated this past weekend went off without a hitch. The highlight: I finally got an opportunity to say "Frodo, bring forth the ring!" in a context where it made sense.

Atomicity, volatility and immutability are different, part one

I get a fair number of questions about atomicity, volatility, thread safety, immutability and the like; the questions illustrate a lot of confusion on these topics. Let's take a step back and examine each of these ideas to see what the differences are between them.

First off, what do we mean by "atomic"? From the Greek ἄτομος, meaning "not divisible into smaller parts", an "atomic" operation is one which is always observed to be done or not done, but never halfway done. The C# specification clearly defines what operations are atomic in section 5.5. The atomic operations are: reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up four bytes or less, like int, short and so on. Reads and writes of variables of value types that take more than four bytes, like double, long and decimal, are not guaranteed to be atomic by the C# language. 1

What does it mean for a read and write of an int to be atomic?  Suppose you have static variables of type int. X is 2, Y is 1, Z is 0. Then on one thread we say:

Z = X;

and on another thread:

X = Y

Each thread does one read and one write. Each read and write is itself atomic. What is the value of Z? Without any synchronization, the threads will race. If the first thread wins then Z will be 2. If the second thread wins then Z will be 1. But Z will definitely be one of those two values, you just can't say which.

Now consider an immutable struct:

struct MyLong
{
  public readonly int low;
  public readonly int high;
  public MyLong(low, high)
  {
    this.low = low;
    this.high = high;
  }
}

Ignore for the moment the evil that is public fields. Suppose we have a fields Q, R and S of type MyLong initialized to (0x01234567, 0x0BADF00D), (0x0DEDBEEF, 0x0B0B0B0B) and (0, 0),  respectively. On two threads we say:

S = Q;

and

Q = R;

We have two threads. Each thread does one read and one write, but the reads and writes are not atomic. They can be divided! This program is actually the same as if the two threads were:

S.low = Q.low;
S.high = Q.high;

and

Q.low = R.low;
Q.high = R.high;

Now, you can't do this because that's writing to a readonly field outside a constructor. But the CLR is the one enforcing that rule; it can break it! (We'll come back to this in the next episode; things are even weirder than you might think.) Value types are copied by value; that's why they're called value types. When copying a value type, the CLR doesn't call a constructor, it just moves the bytes over one atomic chunk at a time. In practice, maybe the jitter has special registers available that allow it to move bigger chunks around, but that's not a guarantee of the C# language. The C# language only goes so far as to guarantee that the chunks are not smaller than four bytes.

Now the threads can race such that perhaps first S.low = Q.low runs, then Q.low = R.low runs, then Q.high = R.high runs, and then S.high = Q.high runs, and hey, S is now (0x0DEDBEEF, 0x0BADF00D), even though that was neither of the original values. The values have been splinched, as Hermione Granger would say.2

And of course, the ordering above is not guaranteed either. The CLR is permitted to copy the chunks over in any order it chooses; it could be copying high before low, for example.

The name "MyLong" was of course no accident; in effect, a two-int readonly struct is how longs are implemented on 32 bit chips. Each operation on the long is done in two parts, on each 32 bit chunk. The same goes for doubles, the same goes for anything larger than 32 bits. If you try reading and writing longs or doubles in multiple threads on 32 bit operating systems without adding some sort of locking around it to make the operation atomic, your data are highly likely to get splinched.

The only operations that are guaranteed by the C# language to be atomic without some sort of help from a lock or other synchronization magic are those listed above: individual reads and writes of variables of the right size. In particular, operations like "increment" and "decrement" are not atomic. When you say

i++;

that's a syntactic sugar for "read i, increment the read value, write the incremented value back to i". The read and write operations are guaranteed to be atomic, but the entire operation is not; it consists of multiple atomic operations and therefore is not itself atomic. Two attempts to increment i on two different threads could interleave such that one of the increments is "lost".

There are many techniques for making non-atomic operations into atomic operations; the easiest is simply to wrap every access to the variable in question with a lock, so that it is never the case that two threads are messing with the variable at the same time. You can also use the Interlocked family of helper methods which provide atomic increment, atomic compare-and-exchange, and so on.

Have a lovely Memorial Day weekend, American readers. I'm spending my Memorial Day weekend marrying a close personal friend.3 Should be fun!

Next time on FAIC: readonly inside a struct is the moral equivalent of cheque kiting, plus ways you can make the atomicity guarantees stronger or weaker.

  1. There is no guarantee that they are not atomic! They might in practice be atomic on some hardware. Or they might not.
  2. were she a computer programmer.
  3. Actually, I am marrying two close personal friends. To each other, even!

Locks and exceptions do not mix

A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to situations in which an unoptimized build had different potential deadlocks than an optimized build of the same source code. This is unfortunate, so we've fixed that for C# 4.0. However, all is still not rainbows, unicorns and Obama, as we'll see.

Recall that lock(obj){body} was a syntactic sugar for

var temp = obj;
Monitor.Enter(temp);
try { body }
finally { Monitor.Exit(temp); }

The problem here is that if the compiler generates a no-op instruction between the monitor enter and the try-protected region then it is possible for the runtime to throw a thread abort exception after the monitor enter but before the try. In that scenario, the finally never runs so the lock leaks, probably eventually deadlocking the program. It would be nice if this were impossible in unoptimized and optimized builds.

In C# 4.0 we've changed lock so that it now generates code as if it were

bool lockWasTaken = false;
var temp = obj;
try { Monitor.Enter(temp, ref lockWasTaken); { body } }
finally { if (lockWasTaken) Monitor.Exit(temp); }

The problem now becomes someone else's problem; the implementation of Monitor.Enter takes on responsibility for atomically setting the flag in a manner that is immune to thread abort exceptions messing it up.

So everything is good now, right?

Sadly, no. It's consistently bad now, which is better than being inconsistently bad. But there's an enormous problem with this approach. By choosing these semantics for "lock" we have made a potentially unwarranted and dangerous choice on your behalf; implicit in this codegen is the belief that a deadlocked program is the worst thing that can happen. That's not necessarily true! Sometimes deadlocking the program is the better thing to do -- the lesser of two evils. 

The purpose of the lock statement is to help you protect the integrity of a mutable resource that is shared by multiple threads. But suppose an exception is thrown halfway through a mutation of the locked resource. Our implementation of lock does not magically roll back the mutation to its pristine state, and it does not complete the mutation. Rather, control immediately branches to the finally, releasing the lock and allowing every other thread that is patiently waiting to immediately view the messed-up partially mutated state! If that state has privacy, security, or human life and safety implications, the result could be very bad indeed. In that case it is possibly better to deadlock the program and protect the messed-up resource by denying access to it entirely. But that's obviously not good either.

Basically, we all have a difficult choice to make whenever doing multi-threaded programming. We can (1) automatically release locks upon exceptions, exposing inconsistent state and living with the resulting bugs (bad) (2) maintain locks upon exceptions, deadlocking the program (arguably often worse) or (3) carefully implement the bodies of locks that do mutations so that in the event of an exception, the mutated resource is rolled back to a pristine state before the lock is released. (Good, but hard.)

This is yet another reason why the body of a lock should do as little as possible. Usually the rationale for having small lock bodies is to get in and get out quickly, so that anyone waiting on the lock does not have to wait long. But an even better reason is because small, simple lock bodies minimize the chance that the thing in there is going to throw an exception. It's also easier to rewrite mutating lock bodies to have rollback behaviour if they don't do very much to begin with.

And of course, this is yet another reason why aborting a thread is pure evil. Try to never do so!