Atomicity, volatility and immutability are different, part two

Last time we established that an “atomic” read or write of a variable means that in multithreaded scenarios, you never end up with “halfway mutated” values in the variable. The variable goes from unmutated to mutated directly, with no intervening state. I also talked a bit about how making fields of a struct readonly has no effect on atomicity; when the struct is copied around, it may be copied around four bytes at a time regardless of whether its fields are marked as readonly or not.

Continue reading

Atomicity, volatility and immutability are different, part one

I get a fair number of questions about atomicity, volatility, thread safety, immutability and the like; the questions illustrate a lot of confusion on these topics. Let’s take a step back and examine each of these ideas to see what the differences are between them.

First off, what do we mean by “atomic”? From the Greek ἄτομος, meaning “not divisible into smaller parts”, an “atomic” operation is one which is always observed to be done or not done, but never halfway done. The C# specification clearly defines what operations are atomic in section 5.5. The atomic operations are: reads and writes of variables of any reference type, or, effectively, any built-in value type that takes up four bytes or less, like int, short and so on. Reads and writes of variables of value types that take more than four bytes, like double, long and decimal, are not guaranteed to be atomic by the C# language. (There is no guarantee that they are not atomic! They might in practice be atomic on some hardware. Or they might not.)

Continue reading

Read-only and threadsafe are different

Here’s a common problem that we face in the compiler realm all the time: you want to make an efficient immutable lookup table for mapping names to “symbols”. This is in a sense the primary problem that the compiler has to solve; someone says x = y + z; and we have to figure out what x, y and z mean before we can do any more analysis. An obvious way to do that is to figure out all the name-to-symbol mappings for a particular declaration space once, ahead of time, stuff the results into a lookup table, and then use that lookup table during the analysis.

The lookup table can be immutable because once it is built, it’s built; no more symbols are going to be added to it. It’s going to be used solely as a lookup mechanism. A common and cheap way of doing that is to use what I whimsically call “popsicle” immutability: the data structure is a fully mutable structure until you freeze it, at which point it becomes an error to attempt to change it. This technique differs markedly from the sort of “persistently re-usable data structure” immutability we’ve talked about in the past, where you want to reuse existing bits of the data structure as you build new, different data structures out of it.

For example, we might write up a really simple little hash table that supports “freezing”. Something like this:

abstract class Symbol
{
public string Name { get; protected set; }
}
sealed class SymbolTable
{
private bool frozen = false;
private class BucketListNode
{
   public Symbol Symbol { get; set; }
   public BucketListNode Next { get; set; }
   public BucketListNode Prev { get; set; }
}
private BucketListNode[] buckets = new BucketListNode[17];
private int GetBucket(string s)
{
   return Math.Abs(s.GetHashCode()) % buckets.Length;
}
public void Add(Symbol symbol)
{
   // Omitted: error checking for null symbol, null name,
// duplicated entry, and so on.
   if (this.frozen)
     throw new InvalidOperationException();
   int bucket = GetBucket(symbol.Name);
   var node = new BucketListNode();
   node.Symbol = symbol;
   node.Prev = null;
   node.Next = buckets[bucket];
   if (node.Next != null)
   node.Next.Prev = node;
   buckets[bucket] = node;
 }
public void Freeze()
 {
   this.frozen = true;
 }
 public Symbol Lookup(string name)
 {
   // Omitted: error checking
   int bucket = GetBucket(name);
   for (var node = buckets[bucket]; node != null; node = node.Next)
   {
     if (node.Symbol.Name == name)
       return node.Symbol;
   }
   return null;
 }
}
sealed class SymbolTable
{
private bool frozen = false;
private class BucketListNode
{
   public Symbol Symbol { get; set; }
   public BucketListNode Next { get; set; }
   public BucketListNode Prev { get; set; }
}
private BucketListNode[] buckets = new BucketListNode[17];
private int GetBucket(string s)
{
   return Math.Abs(s.GetHashCode()) % buckets.Length;
}
public void Add(Symbol symbol)
{
   // Omitted: error checking
   if (this.frozen)
     throw new InvalidOperationException();
   int bucket = GetBucket(symbol.Name);
   var node = new BucketListNode();
   node.Symbol = symbol;
   node.Prev = null;
   node.Next = buckets[bucket];
   if (node.Next != null)
     node.Next.Prev = node;
   buckets[bucket] = node;
}
public void Freeze()
{
  this.frozen = true;
}
public Symbol Lookup(string name)
{
   // Omitted: error checking
   int bucket = GetBucket(name);
   for (var node = buckets[bucket]; node != null; node = node.Next)
   {
     if (node.Symbol.Name == name)
       return node.Symbol;
   }
   return null;
}
}

Very simple. We have an array of seventeen doubly-linked lists. We balance the table by choosing which of the seventeen linked lists to go with by hashing the name of the symbol. That way our lookup cost is one-seventeeth the cost of doing a straight lookup in a complete list.

Now of course there are other ways to do this efficiently. We could be sorting the list of symbols and doing a binary search. We could be re-hashing into a larger array of bucket lists if it turns out that there are thousands of symbols in this table. For the sake of argument, let’s suppose that we’ve decided to go with a fixed-number-of-buckets hashing approach, as we’ve done here.

One of the much-touted benefits of immutable data structures is that they are “threadsafe”. Since they cannot be written to, you’ll never run into a situation where a write operation is interrupted halfway by a thread switch, causing another thread to read an inconsistent data structure. However, it is a fallacy to believe that just because a data structure does not admit any way for you change its contents, its implementation must be threadsafe!

Suppose for example we do a real-world performance analysis of our little table here and discover that in practice, this sort of pattern comes up a lot in our customers’ code:

    x.Foo();
    x.Bar();
    x.Blah();
    y.Abc();
    y.Def();
    y.Ghi();

That is, the same identifier is looked up multiple times in a row in the context of a particular symbol table. We might decide to add a little optimization to our table’s lookup method:

for (var node = buckets[bucket]; node != null; node = node.Next)
{
  if (node.Symbol.Name == name)
  {
    MoveToFront(bucket, node);
    return node.Symbol;
  }
}
...
private void MoveToFront(int bucket, BucketListNode node)
{
  if (buckets[bucket] == node)
    return;
  node.Prev.Next = node.Next;
  if (node.Next != null)
    node.Next.Prev = node.Prev;
  node.Prev = null;
  node.Next = buckets[bucket];
  node.Next.Prev = node;
  buckets[bucket] = node;
}

We have essentially tweaked our hash table to have an array of Most Recently Used Lists, rather than merely an array of Doubly Linked Lists. That might be a performance win in the case where the lists get long and the same identifiers tend to be looked up in clusters. (It is enough of a real-world win that in the VBScript engine, the lookup tables use a variant of this optimization.)

But you see what we’ve done here? Reading the table now sometimes mutates its interior structure, and that mutation is not threadsafe! Even though there is no way for the user to logically change a frozen table, we do not ensure that reading the data is threadsafe.

In practice, most data structures do not do mutations on reads, but you cannot rely upon that unless it is documented. For example, the documentation for the Dictionary class notes that a Dictionary is threadsafe for multiple readers so long as there are no writers (though of course the actual enumerators are not threadsafe objects, as they are constantly mutating; rather, the collection being enumerated is threadsafe). Unless the author of a particular object makes a guarantee to you like that, you should assume that none of the operations are threadsafe, even if they appear to be read-only.


Original:

https://web.archive.org/web/20150523082529/http://blogs.msdn.com/b/ericlippert/archive/2011/05/23/read-only-and-threadsafe-are-different.aspx

Optional argument corner cases, part four

Last time we discussed how some people think that an optional argument generates a bunch of overloads that call each other. People also sometimes incorrectly think that

void M(string format, bool b = false) 
{ 
  Console.WriteLine(format, b); 
}

is actually a syntactic sugar for something morally like:

void M(string format, bool? b) 
{ 
  bool realB = b ?? false; 
  Console.WriteLine(format, realB); 
}

Continue reading

Optional argument corner cases, part three

A lot of people seem to think that this:

void M(string x, bool y = false) 
{ 
  ... whatever ... 
}

is actually a syntactic sugar for the way you used to have to write this in C#, which is:

void M(string x) 
{ 
  M(x, false); 
} 
void M(string x, bool y) 
{ 
  ... whatever ... 
}

But it is not. Continue reading

Optional argument corner cases, part two

Last time we saw that the declared optional arguments of an interface method need not be optional arguments of an implementing class method. That seems potentially confusing; why not require that an implementing method on a class exactly repeat the optional arguments of the declaration?

Because the cure is worse than the disease, that’s why.

Continue reading

Optional argument corner cases, part one

In C# 4.0 we added “optional arguments”; that is, you can state in the declaration of a method’s parameter that if certain arguments are omitted, then constants can be substituted for them:

void M(int x = 123, int y = 456) { }

can be called as M(), M(0) and M(0, 1). The first two cases are treated as though you’d said M(123, 456) and M(0, 456) respectively.

This was a controversial feature for the design team, which had resisted adding this feature for almost ten years despite numerous requests for it and the example of similar features in languages like C++ and Visual Basic. Though obviously convenient, the convenience comes at a pretty high price of bizarre corner cases that the compiler team definitely needs to deal with, and customers occasionally run into by accident. I thought I might talk a bit about a few of those bizarre corner cases.

Continue reading