Construction destruction

Take a look at this little program skeleton:

class Foo 
{ 
    private int x;
    private int y;
    public Foo(int x, int y) 
    {
        this.x = x;
        this.y = y;
        SideEffects.Alpha(); // Notice: does not use "this"
    }
    ~Foo() 
    { 
        SideEffects.Charlie(); 
    }
}
static class SideEffects
{
    public static void Alpha() { ... }
    public static void Bravo() { ... }
    public static void Charlie() { ... }
    public static void M()
    {
        Foo foo = new Foo(1, 2); 
        Bravo();
    }
}

Let’s suppose we have three side effects: Alpha, Bravo and Charlie. What precisely they are is not important. The question is: what do we know about the order in which Alpha, Bravo and Charlie execute when a call to M() occurs? Continue reading

Advertisements

The no-lock deadlock

People sometimes ask me if there is a cheap-and-easy way to guarantee thread safety. For example, “if my method only reads and writes local variables and parameters, can I guarantee that my method is threadsafe?” Questions like that are dangerous because they are predicated on an incorrect assumption: that if every method of a program is “threadsafe”, whatever that means, then the entire program is “threadsafe”. I might not be entirely clear on what “threadsafe” means, but I do know one thing about it: thread safety is a property of entire programs, not of individual methods.

To illustrate why these sorts of questions are non-starters, today I present to you the world’s simplest deadlocking C# program:

class C
{
  static C() 
  {
    // Let's run the initialization on another thread!
    var thread = new System.Threading.Thread(Initialize);
    thread.Start();
    thread.Join();
  }
  static void Initialize() { }
  static void Main() { }
}

(Thanks to my Roslyn compiler team colleague Neal Gafter for this example, which was adapted from his book Java Puzzlers.)

At first glance clearly ever method of this incredibly simple program is “threadsafe”. There is only a single variable anywhere in the program; it is local, is written once, is written before it is read, is read from the same thread it was written on, and is guaranteed to be atomic. There are apparently no locks anywhere in the program, and so there are no lock ordering inversions. Two of the three methods are empty. And yet this program deadlocks with 100% certainty; the program “globally” is clearly not threadsafe, despite all those nice “local” properties. You can build a hollow house out of solid bricks; so too you can build a deadlocking program out of threadsafe methods.

The reason why this deadlocks is a consequence of the rules for static constructors in C#; the important rule is that a static constructor runs exactly zero or one times, and runs before a static method call or instance creation in its type. Therefore the static constructor of C must run to completion before Main starts. The CLR notes that C‘s static constructor is “in flight” on the main thread and calls it. The static constructor then starts up a new thread. When that thread starts, the CLR sees that a static method is about to be called on a type whose static constructor is “in flight” another thread. It immediately blocks the new thread so that the Initialize method will not start until the main thread finishes running the class constructor. The main thread blocks itself waiting for the new thread to complete, and now we have two threads each waiting for the other to complete.


Next time on FAIC: We’re opening up the new Coverity office in Seattle! After which, we’ll take a closer look at the uses and abuses of the static constructor.

What is the defining characteristic of a local variable?

If you ask a dozen C# developers what a “local variable” is, you might get a dozen different answers. A common answer is of course that a local is “a storage location on the stack”. But that is describing a local in terms of its implementation details; there is nothing in the C# language that requires that locals be stored on a data structure called “the stack”, or that there be one stack per thread. (And of course, locals are often stored in registers, and registers are not the stack.)

A less implementation-detail-laden answer might be that a local variable is a variable whose storage location is “allocated from the temporary store”. That is, a local variable is a variable whose lifetime is known to be short; the local’s lifetime ends when control leaves the code associated with the local’s declaration space.

That too, however, is a lie. The C# specification is surprisingly vague about the lifetime of an “ordinary” local variable, noting that its lifetime is only kinda-sorta that length. The jitter’s optimizer is permitted broad latitude in its determination of local lifetime; a local can be cleaned up early or late. The specification also notes that the lifetimes of some local variables are necessarily extended beyond the point where control leaves the method body containing the local declaration. Locals declared in an iterator block, for instance, live on even after control has left the iterator block; they might die only when the iterator is itself collected. Locals that are closed-over outer variables of a lambda are the same way; they live at least as long as the delegate that closes over them. And in the upcoming version of C#, locals declared in async blocks will also have extended lifetimes; when the async method returns to its caller upon encountering an “await”, the locals live on and are still there when the method resumes. (And since it might not resume on the same thread, in some bizarre situations, the locals had better not be stored on the stack!)

So if locals are not “variables on the stack” and locals are not “short lifetime variables” then what are locals?

The answer is of course staring you in the face. The defining characteristic of a local is that it can only be accessed by name in the block which declares it; it is local to a block. What makes a local truly unique is that it can only be a private implementation detail of a method body. The name of that local is never of any use to code lexically outside of the method body.

Atomicity, volatility and immutability are different, part three

So what does the keyword “volatile” mean, anyway? Misinformation abounds on this subject.

First off, so as to not bury the lead: in C# the rules have been carefully designed so that every volatile field read and write is also atomic. (Of course the converse does not follow; it is perfectly legal for an operation to be atomic without it being volatile, whatever that means.)

The way this is achieved is simple; the rules of C# only permit you to annotate fields with volatile if the field has a type that is guaranteed to have atomic reads and writes.

There is no logical requirement for this property; logically, volatility and atomicity are orthogonal. One could have a volatile-but-not-atomic read, for example. The very idea of doing so gives me the heebie-jeebies! Getting an up-to-date value that has been possibly splinched through the middle seems like a horrid prospect. I am very glad that C# has ensured that any volatile read or write is also an atomic read or write.

Volatility and immutability are essentially opposites; as we’ll see the whole point of volatility is to impose some kind of safety upon certain dangerous kinds of mutability.

But what does volatile mean anyway? To understand this we must first go back in time to the early days of the C language. Suppose you are writing a device driver in C for a temperature-recording device at a weather station:

int* currentBuf = bufferStart;
while(currentBuf < bufferEnd)
{
  int temperature = deviceRegister[0];
  *currentBuf = temperature;
  currentBuf++;
}

It is entirely possible that the optimizer reasons as follows: we know that bufferStart, bufferEnd and deviceRegister are initialized at the beginning of the program and never changed afterwards; they can be treated as constants. We know that the memory addresses mapped to deviceRegister do not overlap the buffer; there is no aliasing going on here. We see that there are never any writes whatsoever in this program to deviceRegister[0]. Therefore the optimizer can pretend that you wrote:

int* currentBuf = bufferStart;
int temperature = deviceRegister[0];
while(currentBuf < bufferEnd)
{
  *currentBuf = temperature;
  currentBuf++;
}

which obviously is completely different in our scenario. The optimizer makes the seemingly-reasonable assumption that if it can prove that a variable is never written to again then it need only read from it once. But if the variable in question is marked as volatile — meaning it changes on its own, outside of the control of the program — then the compiler cannot safely make this optimization.

That’s what volatile is for in C. Any additional meanings of volatile are not required by the C standard, and are compiler-specific extensions.[1. And there are a few other standardized valid usages as well; it also prevents optimizations that would screw up non-local gotos and some other relatively obscure scenarios.]

So now let’s make an analogy.

Imagine that there is a three-ring binder with a thousand pages in it, called The Big Book of Memory.  Each page has a thousand numbers on it, written in pencil so that they can be changed. You also have a “register page” which only has a dozen numbers on it, each with special meaning. When you need to do some operation on a number, first you flip to the right page, then you look up the right number on that page, then you copy it into the register page. You do your calculations only on the register page. When you are done a calculation you might write a number back somewhere into the book, or you might do another read from the book to get another value to operate on.

Suppose you are doing some operation in a loop — as with our temperature example above. You might decide as an optimization that you are pretty sure that a particular number location is never going to change. So instead of reading it out of the book every time you need it, you copy it onto your register page, and never read it again. That’s the optimization we proposed above. If one of those numbers is changing constantly based on factors outside your control then making this optimization is not valid. You need to go back to the book every time.

You’ll note that nothing in our C-style volatile story so far said anything at all about multithreading. C-style volatile is about telling the compiler to turn off optimizations because the compiler cannot make reasonable assumptions about whether a given variable is changing or not. It is not about making things threadsafe. Let’s see why![1. Of course we already know one reason: volatile operations are not guaranteed to be atomic, and thread safety requires atomicity. But there is a deeper reason why C-style volatility does not create thread safety.]

Suppose you have one thread that is writing a variable and another thread that is reading the same variable. You might think that this is exactly the same as our “C-style volatile” scenario. Imagine, for example, that our deviceRegister[0] expression above was not reading from some hardware register changing based on outside factors, but rather was simply reading from a memory address that was potentially changing based on the operation of another thread that is feeding it temperature data from some other source. Does “C-style volatile” solve our threading problem?

Kinda, sorta… well, no. The assumption we’ve just made is that a memory address being changed by the temperature outside is logically the same as a memory address being changed by another threadThat assumption is not warranted in general. Let’s continue with our analogy to see why that is by first considering a model in which it is warranted.

Suppose instead of a number being updated by magic because it is somehow reflecting the temperature, suppose instead we have two people taking turns using the book. One is the Reader and one is the Writer. They only have one book and one register page between them, so they have to cooperate.

The Reader again might be reading the same book number slot over and over again. The Reader might decide that they can read the value just once, copy it to the register page, and then keep on using the copy. The Reader does this for a while. When it is time to give the Writer a turn, the Reader writes all of the current register page values to a special page in the book that is reserved for the Reader’s use. The Reader then hands the book and the register page to the Writer.

The Writer fills in the register page from their personal special location in the book, and keeps on doing what they were doing, which is writing new values into the book. When the Writer wants to take a break, again, they write the contents of the register page into the book and hand the register page back to the Reader.

The problem should be obvious. If the Writer is writing to the location that the Reader previously cached to their register page then the Reader is making decisions based on an out-of-date value.

If this analogy is a good description of the memory model of the processor then marking the shared location as “C-style volatile” does the right thing. The Reader knows that they should not be caching the value; to get the up-to-date value they have to go back to the book every time because they don’t know whether the Writer changed the value when the Writer last had control.[1. This is particularly true in non-cooperative multithreading; perhaps the Reader and the Writer do not choose for themselves the schedule for taking turns!]

Unfortunately, that is not the actual memory model of many modern multi-processor machines. The actual memory model goes more like this:

Suppose there are two people sharing one book — again, the Reader and the Writer. Each has their own register page,plus a blank book page. The Reader goes to read a value from the book. But the book isn’t there! Instead, there is the Librarian. The Reader asks the Librarian for the book and the Librarian says “you can’t have the book itself; it is far too valuable to let you use it. But give me your blank page, and I’ll copy the stuff from the book onto it“. The Reader figures that is better than nothing. In fact, it’s really good, now that we consider it! The Librarian hands the Reader back a copy of the entire page, not just the single number the Reader wanted. Now the Reader can now go to town and really efficiently do calculations involving any number in that page without talking to the Librarian again. Only when the Reader wants something outside of the bounds of the copied page do they have to go back to the Librarian. The Reader’s performance just went way up.

Similarly, the Writer goes to write a value in the book at the same time. (Remember, the Reader and Writer no longer have to take turns using the register page because they both have their own register page.) But the Librarian does not allow this. The Librarian says “here, let me make you a copy of the page you want to write to. You make your changes to the copy, and when you are done, let me know and I will update the entire page.” The Writer thinks this is great! The Writer can write all kinds of crazy things and never talk to the Librarian again until the Writer needs to write to (or read from) a different page. When that happens the Writer hands the modified copy page to the Librarian, the Librarian copies the Writer’s page back into the book, and gives the Writer a copy of the new page that the Writer wants.

Clearly this is awesome if the Reader and the Writer are not both reading and writing the same page of the book. But what if they are? C-style volatile does not help at all in this situation! Suppose the Reader decides, oh, this memory location is marked as volatile, so I will not cache the read of the value onto my register page. Does that help? Not a bit! Even if the reader always goes back to the page, they are going back to their copy of the page, the copy made for them by the Librarian. Suppose the Reader then says, “OK, this thing is volatile, so when I read it, heck, I’ll just go back to the Librarian again and have the Librarian make me a new copy of this page”. Does that help? No, because the Writer might not have submitted the changes to the Librarian yet! The Writer has been making changes to their local copy.

In order to solve this problem the Reader could have a way to tell the Librarian “Hey, Librarian! I need to read the most up-to-date version of this location from the Book”. The Librarian then has to go find the Writer and ask the Writer to stop what they are doing and submit the changes right now. Both the Reader and the Writer come to a screeching halt and the Librarian then does the laborious work of ensuring that the Book is consistent.[1. And of course we haven’t even considered situations where there are multiple readers and multiple writers all partying on the same page.] Alternatively, the Writer could tell the Librarian “hey, I’m about to update this value; go find anyone who is about to read it and tell them that they need to fetch a new copy of this page when I’m done”. The exact strategy chosen doesn’t really matter for the purposes of this analogy; the point is that everyone has to somehow cooperate to make sure that a consistent view of all the edits is achieved.

This strategy gives a massive performance increases in the common scenario where multiple readers and multiple writers are each working on data that is highly contiguous — that is, each reader and each writer does almost all of their work on the one page they have copied locally, so that they don’t have to go back to the Librarian. It gives massive performance penalties in scenarios where readers and writers are working on the same page and cannot tolerate inconsistencies or out-of-date values; the readers and writers are constantly going back to the Librarian, stopping everybody from doing work, and spending all their time copying stuff back into and out of the Book of Memory to ensure that the local caches are consistent.

Clearly we have a problem here. If C-style volatile doesn’t solve this problem, what does solve this problem? C#-style volatile, that’s what.

Sorta. Kinda. In a pretty bogus way, actually.

In C#, volatile means not only “make sure that the compiler and the jitter do not perform any code reordering or register caching optimizations on this variable“. It also means “tell the processors to do whatever it is they need to do to ensure that I am reading the latest value, even if that means halting other processors and making them synchronize main memory with their caches“.

Actually, that last bit is a lie. The true semantics of volatile reads and writes are considerably more complex than I’ve outlined here; in fact they do not actually guarantee that every processor stops what it is doing and updates caches to/from main memory. Rather, they provide weaker guarantees about how memory accesses before and after reads and writes may be observed to be ordered with respect to each other. Certain operations such as creating a new thread, entering a lock, or using one of the Interlocked family of methods introduce stronger guarantees about observation of ordering. If you want more details, read sections 3.10 and 10.5.3 of the C# 4.0 specification.

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you’re attempting to read and write the same value on two different threads without putting a lock in place. Locks guarantee that memory read or modified inside the lock is observed to be consistent, locks guarantee that only one thread accesses a given hunk of memory at a time, and so on. The number of situations in which a lock is too slow is very small, and the probability that you are going to get the code wrong because you don’t understand the exact memory model is very large. I don’t attempt to write any low-lock code except for the most trivial usages of Interlocked operations. I leave the usage of volatile to real experts.

For more information on this incredibly complex topic, see:

Why C-style volatile is almost useless for multi-threaded programming

Joe Duffy on why attempting to ‘fix’ volatile in C# is a waste of time

Vance Morrison on incoherent caches and other aspects of modern memory models

Atomicity, volatility and immutability are different, part two

Last time we established that an “atomic” read or write of a variable means that in multithreaded scenarios, you never end up with “halfway mutated” values in the variable. The variable goes from unmutated to mutated directly, with no intervening state. I also talked a bit about how making fields of a struct readonly has no effect on atomicity; when the struct is copied around, it may be copied around four bytes at a time regardless of whether its fields are marked as readonly or not.

There is a larger problem though with reasoning about readonly fields in a struct beyond their non-atomicity. Yes, when you read from readonly fields in a struct on multiple threads without any locking, you can get inconsistent results due to race conditions. But the situation is actually worse than that; readonly fields need not give you results that you think are consistent even on one thread! Basically, readonly fields in a struct are the moral equivalent of the struct author writing a cheque without having the funds to back it.

I originally meant to write a lengthy article about this fact, but then I discovered that Joe Duffy already had done a great job. Check out his article here.

Briefly summing up: since a struct does not “own” its storage, a readonly on a field of a struct only means that the compiler will prevent the author of the code from writing directly to the field outside of a constructor. The actual variable that the struct is borrowing storage from need not be readonly and is therefore free to change as the owner of that variable sees fit. Consider for example:

struct S
{
  readonly int x;
  public S(int x) { this.x = x; }
  public void M(ref S other)
  {
    int oldX = this.x;
    other = new S(456);
    int newX = this.x;
  }
}

Since this.x is a readonly field you might think that newX and oldX always have the same value. But if you say:

S s = new S(123);
s.M(ref s);

then this and other are both aliases for s, and s is free to change!

Returning now to atomicity, I said last time that I’d discuss ways in which you can get more or less atomicity than the C# language guarantees. Basically the C# language guarantees that reads and writes of references are atomic, and reads and writes of built-in value types of four bytes or smaller (int, uint, float, char, bool, and so on) are guaranteed to be atomic, and that’s it.

The CLI specification actually makes stronger guarantees. The CLI guarantees that reads and writes of variables of value types that are the size (or smaller) of the processor’s natural pointer size are atomic; if you are running C# code on a 64 bit operating system in a 64 bit version of the CLR then reads and writes of 64 bit doubles and long integers are also guaranteed to be atomic. The C# language does not guarantee that, but the runtime spec does. (If you are running C# code in some environment that is not implemented by some implementation of the CLI then of course you cannot rely upon that guarantee; contact the vendor who sold you the runtime if you want to know what guarantees they provide.)

Another subtle point about atomic access is that the underlying processor only guarantees atomicity when the variable being read or written is associated with storage that is aligned to the right location in memory. Ultimately the variable will be implemented as a pointer to memory somewhere. On a 32 bit operating system, that pointer has to be evenly divisible by 4 in order for the read or write to be guaranteed to be atomic, and on a 64 bit operating system it has to be evenly divisible by 8. If you do something crazy like:

int* pInt1 = &intArr[0];
byte* pByte = (byte*)pInt;
pByte += 6;
int* pInt2 = (int*) pByte;
*pInt2 = 0x000DECAF;

then you are not guaranteed that the write which writes half the bytes to intArr[1] and the other half to intArr[2] is atomic. The underlying two array slots may be observed to be updated one after the other.

Now, the CLR does guarantee that all fields of all structs that are of the natural pointer size (or smaller) will by default be aligned in memory such that they do not cross a boundary like this, and therefore reads and writes of fields will be atomic. However, the C# language does permit you to tell the CLR “do not use the default structure packing rules; I will tell you how to lay out the bytes in a structure“. (Perhaps because you have a byte buffer from some unmanaged code and you’re using unsafe code to get a pointer to a struct that puts some structure onto those bytes.) If you tell the CLR to put an int field inside the structure such that it falls across an alignment boundary then the access to that field will not be atomic, and that’s your problem to deal with.

Next time on FAIC: what is volatile, and how does it relate to atomicity in C#? The next episode will be somewhat delayed as I am in California this week attending yet another wedding. It seems like half my friends decided to get married in a row this year! The wedding I officiated this past weekend went off without a hitch. The highlight: I finally got an opportunity to say “Frodo, bring forth the ring!” in a context where it made sense.

Atomicity, volatility and immutability are different, part one

I get a fair number of questions about atomicity, volatility, thread safety, immutability and the like; the questions illustrate a lot of confusion on these topics. Let’s take a step back and examine each of these ideas to see what the differences are between them.

First off, what do we mean by “atomic”? From the Greek ἄτομος, meaning “not divisible into smaller parts”, an “atomic” operation is one which is always observed to be done or not done, but never halfway done. The C# specification clearly defines what operations are atomic in section 5.5. The atomic operations are: reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up four bytes or less, like int, short and so on. Reads and writes of variables of value types that take more than four bytes, like double, long and decimal, are not guaranteed to be atomic by the C# language. [1. There is no guarantee that they are not atomic! They might in practice be atomic on some hardware. Or they might not.]

What does it mean for a read and write of an int to be atomic?  Suppose you have static variables of type int. X is 2, Y is 1, Z is 0. Then on one thread we say:

Z = X;

and on another thread:

X = Y

Each thread does one read and one write. Each read and write is itself atomic. What is the value of Z? Without any synchronization, the threads will race. If the first thread wins then Z will be 2. If the second thread wins then Z will be 1. But Z will definitely be one of those two values, you just can’t say which.

Now consider an immutable struct:

struct MyLong
{
  public readonly int low;
  public readonly int high;
  public MyLong(low, high)
  {
    this.low = low;
    this.high = high;
  }
}

Ignore for the moment the evil that is public fields. Suppose we have a fields Q, R and S of type MyLong initialized to (0x01234567, 0x0BADF00D), (0x0DEDBEEF, 0x0B0B0B0B) and (0, 0),  respectively. On two threads we say:

S = Q;

and

Q = R;

We have two threads. Each thread does one read and one write, but the reads and writes are not atomic. They can be divided! This program is actually the same as if the two threads were:

S.low = Q.low;
S.high = Q.high;

and

Q.low = R.low;
Q.high = R.high;

Now, you can’t do this because that’s writing to a readonly field outside a constructor. But the CLR is the one enforcing that rule; it can break it! (We’ll come back to this in the next episode; things are even weirder than you might think.) Value types are copied by value; that’s why they’re called value types. When copying a value type, the CLR doesn’t call a constructor, it just moves the bytes over one atomic chunk at a time. In practice, maybe the jitter has special registers available that allow it to move bigger chunks around, but that’s not a guarantee of the C# language. The C# language only goes so far as to guarantee that the chunks are not smaller than four bytes.

Now the threads can race such that perhaps first S.low = Q.low runs, then Q.low = R.low runs, then Q.high = R.high runs, and then S.high = Q.high runs, and hey, S is now (0x0DEDBEEF, 0x0BADF00D), even though that was neither of the original values. The values have been splinched, as Hermione Granger would say.[1. were she a computer programmer.]

And of course, the ordering above is not guaranteed either. The CLR is permitted to copy the chunks over in any order it chooses; it could be copying high before low, for example.

The name “MyLong” was of course no accident; in effect, a two-int readonly struct is how longs are implemented on 32 bit chips. Each operation on the long is done in two parts, on each 32 bit chunk. The same goes for doubles, the same goes for anything larger than 32 bits. If you try reading and writing longs or doubles in multiple threads on 32 bit operating systems without adding some sort of locking around it to make the operation atomic, your data are highly likely to get splinched.

The only operations that are guaranteed by the C# language to be atomic without some sort of help from a lock or other synchronization magic are those listed above: individual reads and writes of variables of the right size. In particular, operations like “increment” and “decrement” are not atomic. When you say

i++;

that’s a syntactic sugar for “read i, increment the read value, write the incremented value back to i“. The read and write operations are guaranteed to be atomic, but the entire operation is not; it consists of multiple atomic operations and therefore is not itself atomic. Two attempts to increment i on two different threads could interleave such that one of the increments is “lost”.

There are many techniques for making non-atomic operations into atomic operations; the easiest is simply to wrap every access to the variable in question with a lock, so that it is never the case that two threads are messing with the variable at the same time. You can also use the Interlocked family of helper methods which provide atomic increment, atomic compare-and-exchange, and so on.

Have a lovely Memorial Day weekend, American readers. I’m spending my Memorial Day weekend marrying a close personal friend.[1. Actually, I am marrying two close personal friends. To each other, even!] Should be fun!

Next time on FAIC: readonly inside a struct is the moral equivalent of cheque kiting, plus ways you can make the atomicity guarantees stronger or weaker.

Locks and exceptions do not mix

A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to situations in which an unoptimized build had different potential deadlocks than an optimized build of the same source code. This is unfortunate, so we’ve fixed that for C# 4.0. However, all is still not rainbows, unicorns and Obama, as we’ll see.
Continue reading