Following the pattern

Here’s a question I got from a user recently:

The foreach loop in C# uses a pattern-based approach to semantic analysis. LINQ in C# also uses a pattern-based approach. Why don’t other features, such as the using statement, also use a pattern-based approach?

Great question. First off, what do we mean by a “pattern-based approach”? Continue reading

About these ads

Ref returns and ref locals

“Ref returns” are the subject of another great question from StackOverflow that I thought I might share with a larger audience.

Ever since C# 1.0 you’ve been able to create an “alias” to a variable by passing a “ref to a variable” to certain methods:

static void M(ref int x)
{
  x = 123;
}
  ...
  int y = 456;
  M(ref y);

Despite their different names, x and y are now aliases for each other; they both refer to the same storage location. When x is changed, y changes too because they are the same thing. Basically, “ref” parameters allow you to pass around variables as variables rather than as values. This is a sometimes-confusing feature (because it is easy to confuse “reference types” with “ref” aliases to variables,) but it is generally a pretty well-understood and frequently-used feature.

However, it is a little-known fact that the CLR type system supports additional usages of ref, though C# does not. The CLR type system also allows methods to return refs to variables, and allows local variables to be aliases for other variables. The CLR type system however does not allow for fields that are aliases to other variables. Similarly arrays may not contain managed references to other variables. Both fields and arrays containing refs are illegal because making it legal would overly complicates the garbage collection story.[1. I also note that the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods. For details, see the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature. See also my numerous articles on memory management for more discussion of why C# and the CLR do not allow long-term storage of refs.]

As you might expect, it is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y)
{
  if (x > y)
    return ref x;
  else
    return ref y;
}

Why do this? It is quite different than a conventional “Max” which returns the larger of two values. This returns the larger variable itself, which can then be modified:

int a = 123;
int b = 456;
ref int c = ref Max(ref a, ref b);
c += 100;
Console.WriteLine(b); // 556!

Kinda neat! This would also mean that ref-returning methods could be the left-hand side of an assignment — we don’t need the local “c”:

int a = 123;
int b = 456;
Max(ref a, ref b) += 100;
Console.WriteLine(b); // 556!

Syntactically, ref is a strong marker that something weird is going on. Every time the keyword ref appears before a variable usage, it means “I am now making some other thing an alias for this variable”. Every time it appears before a declaration, it means “this thing must be initialized with a variable marked with ref“.

I know empirically that it is possible to build a version of C# that supports these features because I have done so in order to test-drive the possible feature. Advanced programmers (particularly people porting unmanaged C++ code) often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported mainstream language feature. We have other higher priorities and a limited amount of time and effort available, so we’re not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects and outlaws this situation:

static ref int M1(ref int x)
{
  return ref x;
}
static ref int M2()
{
  int y = 123;
  return ref M1(ref y); // Trouble!
}
static int M3()
{
  ref int z = ref M2();
  return z;
}

M3 returns the contents of M2‘s local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. We could write such a detector, and if the detector could not prove that lifetime safety rules were met then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we’ve really got all the cases. It’s just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If we implemented this feature some day, would you use it? For what? Do you have a really good usage case that could not easily be done some other way? If so, please leave a comment. The more information we have from real customers about why they want features like this, the more likely it will make it into the product someday. It’s a cute little feature and I’d like to be able to get it to customers somehow if there is sufficient interest. However, we also know that “ref parameters” is one of the most misunderstood and confusing features, particularly for novice programmers, so we don’t necessarily want to add more confusing features to the language unless they really pay their own way.

Atomicity, volatility and immutability are different, part three

So what does the keyword “volatile” mean, anyway? Misinformation abounds on this subject.

First off, so as to not bury the lead: in C# the rules have been carefully designed so that every volatile field read and write is also atomic. (Of course the converse does not follow; it is perfectly legal for an operation to be atomic without it being volatile, whatever that means.)

The way this is achieved is simple; the rules of C# only permit you to annotate fields with volatile if the field has a type that is guaranteed to have atomic reads and writes.

There is no logical requirement for this property; logically, volatility and atomicity are orthogonal. One could have a volatile-but-not-atomic read, for example. The very idea of doing so gives me the heebie-jeebies! Getting an up-to-date value that has been possibly splinched through the middle seems like a horrid prospect. I am very glad that C# has ensured that any volatile read or write is also an atomic read or write.

Volatility and immutability are essentially opposites; as we’ll see the whole point of volatility is to impose some kind of safety upon certain dangerous kinds of mutability.

But what does volatile mean anyway? To understand this we must first go back in time to the early days of the C language. Suppose you are writing a device driver in C for a temperature-recording device at a weather station:

int* currentBuf = bufferStart;
while(currentBuf < bufferEnd)
{
  int temperature = deviceRegister[0];
  *currentBuf = temperature;
  currentBuf++;
}

It is entirely possible that the optimizer reasons as follows: we know that bufferStart, bufferEnd and deviceRegister are initialized at the beginning of the program and never changed afterwards; they can be treated as constants. We know that the memory addresses mapped to deviceRegister do not overlap the buffer; there is no aliasing going on here. We see that there are never any writes whatsoever in this program to deviceRegister[0]. Therefore the optimizer can pretend that you wrote:

int* currentBuf = bufferStart;
int temperature = deviceRegister[0];
while(currentBuf < bufferEnd)
{
  *currentBuf = temperature;
  currentBuf++;
}

which obviously is completely different in our scenario. The optimizer makes the seemingly-reasonable assumption that if it can prove that a variable is never written to again then it need only read from it once. But if the variable in question is marked as volatile — meaning it changes on its own, outside of the control of the program — then the compiler cannot safely make this optimization.

That’s what volatile is for in C. Any additional meanings of volatile are not required by the C standard, and are compiler-specific extensions.[1. And there are a few other standardized valid usages as well; it also prevents optimizations that would screw up non-local gotos and some other relatively obscure scenarios.]

So now let’s make an analogy.

Imagine that there is a three-ring binder with a thousand pages in it, called The Big Book of Memory.  Each page has a thousand numbers on it, written in pencil so that they can be changed. You also have a “register page” which only has a dozen numbers on it, each with special meaning. When you need to do some operation on a number, first you flip to the right page, then you look up the right number on that page, then you copy it into the register page. You do your calculations only on the register page. When you are done a calculation you might write a number back somewhere into the book, or you might do another read from the book to get another value to operate on.

Suppose you are doing some operation in a loop — as with our temperature example above. You might decide as an optimization that you are pretty sure that a particular number location is never going to change. So instead of reading it out of the book every time you need it, you copy it onto your register page, and never read it again. That’s the optimization we proposed above. If one of those numbers is changing constantly based on factors outside your control then making this optimization is not valid. You need to go back to the book every time.

You’ll note that nothing in our C-style volatile story so far said anything at all about multithreading. C-style volatile is about telling the compiler to turn off optimizations because the compiler cannot make reasonable assumptions about whether a given variable is changing or not. It is not about making things threadsafe. Let’s see why![1. Of course we already know one reason: volatile operations are not guaranteed to be atomic, and thread safety requires atomicity. But there is a deeper reason why C-style volatility does not create thread safety.]

Suppose you have one thread that is writing a variable and another thread that is reading the same variable. You might think that this is exactly the same as our “C-style volatile” scenario. Imagine, for example, that our deviceRegister[0] expression above was not reading from some hardware register changing based on outside factors, but rather was simply reading from a memory address that was potentially changing based on the operation of another thread that is feeding it temperature data from some other source. Does “C-style volatile” solve our threading problem?

Kinda, sorta… well, no. The assumption we’ve just made is that a memory address being changed by the temperature outside is logically the same as a memory address being changed by another threadThat assumption is not warranted in general. Let’s continue with our analogy to see why that is by first considering a model in which it is warranted.

Suppose instead of a number being updated by magic because it is somehow reflecting the temperature, suppose instead we have two people taking turns using the book. One is the Reader and one is the Writer. They only have one book and one register page between them, so they have to cooperate.

The Reader again might be reading the same book number slot over and over again. The Reader might decide that they can read the value just once, copy it to the register page, and then keep on using the copy. The Reader does this for a while. When it is time to give the Writer a turn, the Reader writes all of the current register page values to a special page in the book that is reserved for the Reader’s use. The Reader then hands the book and the register page to the Writer.

The Writer fills in the register page from their personal special location in the book, and keeps on doing what they were doing, which is writing new values into the book. When the Writer wants to take a break, again, they write the contents of the register page into the book and hand the register page back to the Reader.

The problem should be obvious. If the Writer is writing to the location that the Reader previously cached to their register page then the Reader is making decisions based on an out-of-date value.

If this analogy is a good description of the memory model of the processor then marking the shared location as “C-style volatile” does the right thing. The Reader knows that they should not be caching the value; to get the up-to-date value they have to go back to the book every time because they don’t know whether the Writer changed the value when the Writer last had control.[1. This is particularly true in non-cooperative multithreading; perhaps the Reader and the Writer do not choose for themselves the schedule for taking turns!]

Unfortunately, that is not the actual memory model of many modern multi-processor machines. The actual memory model goes more like this:

Suppose there are two people sharing one book — again, the Reader and the Writer. Each has their own register page,plus a blank book page. The Reader goes to read a value from the book. But the book isn’t there! Instead, there is the Librarian. The Reader asks the Librarian for the book and the Librarian says “you can’t have the book itself; it is far too valuable to let you use it. But give me your blank page, and I’ll copy the stuff from the book onto it“. The Reader figures that is better than nothing. In fact, it’s really good, now that we consider it! The Librarian hands the Reader back a copy of the entire page, not just the single number the Reader wanted. Now the Reader can now go to town and really efficiently do calculations involving any number in that page without talking to the Librarian again. Only when the Reader wants something outside of the bounds of the copied page do they have to go back to the Librarian. The Reader’s performance just went way up.

Similarly, the Writer goes to write a value in the book at the same time. (Remember, the Reader and Writer no longer have to take turns using the register page because they both have their own register page.) But the Librarian does not allow this. The Librarian says “here, let me make you a copy of the page you want to write to. You make your changes to the copy, and when you are done, let me know and I will update the entire page.” The Writer thinks this is great! The Writer can write all kinds of crazy things and never talk to the Librarian again until the Writer needs to write to (or read from) a different page. When that happens the Writer hands the modified copy page to the Librarian, the Librarian copies the Writer’s page back into the book, and gives the Writer a copy of the new page that the Writer wants.

Clearly this is awesome if the Reader and the Writer are not both reading and writing the same page of the book. But what if they are? C-style volatile does not help at all in this situation! Suppose the Reader decides, oh, this memory location is marked as volatile, so I will not cache the read of the value onto my register page. Does that help? Not a bit! Even if the reader always goes back to the page, they are going back to their copy of the page, the copy made for them by the Librarian. Suppose the Reader then says, “OK, this thing is volatile, so when I read it, heck, I’ll just go back to the Librarian again and have the Librarian make me a new copy of this page”. Does that help? No, because the Writer might not have submitted the changes to the Librarian yet! The Writer has been making changes to their local copy.

In order to solve this problem the Reader could have a way to tell the Librarian “Hey, Librarian! I need to read the most up-to-date version of this location from the Book”. The Librarian then has to go find the Writer and ask the Writer to stop what they are doing and submit the changes right now. Both the Reader and the Writer come to a screeching halt and the Librarian then does the laborious work of ensuring that the Book is consistent.[1. And of course we haven't even considered situations where there are multiple readers and multiple writers all partying on the same page.] Alternatively, the Writer could tell the Librarian “hey, I’m about to update this value; go find anyone who is about to read it and tell them that they need to fetch a new copy of this page when I’m done”. The exact strategy chosen doesn’t really matter for the purposes of this analogy; the point is that everyone has to somehow cooperate to make sure that a consistent view of all the edits is achieved.

This strategy gives a massive performance increases in the common scenario where multiple readers and multiple writers are each working on data that is highly contiguous — that is, each reader and each writer does almost all of their work on the one page they have copied locally, so that they don’t have to go back to the Librarian. It gives massive performance penalties in scenarios where readers and writers are working on the same page and cannot tolerate inconsistencies or out-of-date values; the readers and writers are constantly going back to the Librarian, stopping everybody from doing work, and spending all their time copying stuff back into and out of the Book of Memory to ensure that the local caches are consistent.

Clearly we have a problem here. If C-style volatile doesn’t solve this problem, what does solve this problem? C#-style volatile, that’s what.

Sorta. Kinda. In a pretty bogus way, actually.

In C#, volatile means not only “make sure that the compiler and the jitter do not perform any code reordering or register caching optimizations on this variable“. It also means “tell the processors to do whatever it is they need to do to ensure that I am reading the latest value, even if that means halting other processors and making them synchronize main memory with their caches“.

Actually, that last bit is a lie. The true semantics of volatile reads and writes are considerably more complex than I’ve outlined here; in fact they do not actually guarantee that every processor stops what it is doing and updates caches to/from main memory. Rather, they provide weaker guarantees about how memory accesses before and after reads and writes may be observed to be ordered with respect to each other. Certain operations such as creating a new thread, entering a lock, or using one of the Interlocked family of methods introduce stronger guarantees about observation of ordering. If you want more details, read sections 3.10 and 10.5.3 of the C# 4.0 specification.

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you’re attempting to read and write the same value on two different threads without putting a lock in place. Locks guarantee that memory read or modified inside the lock is observed to be consistent, locks guarantee that only one thread accesses a given hunk of memory at a time, and so on. The number of situations in which a lock is too slow is very small, and the probability that you are going to get the code wrong because you don’t understand the exact memory model is very large. I don’t attempt to write any low-lock code except for the most trivial usages of Interlocked operations. I leave the usage of volatile to real experts.

For more information on this incredibly complex topic, see:

Why C-style volatile is almost useless for multi-threaded programming

Joe Duffy on why attempting to ‘fix’ volatile in C# is a waste of time

Vance Morrison on incoherent caches and other aspects of modern memory models