The no-lock deadlock

People sometimes ask me if there is a cheap-and-easy way to guarantee thread safety. For example, “if my method only reads and writes local variables and parameters, can I guarantee that my method is threadsafe?” Questions like that are dangerous because they are predicated on an incorrect assumption: that if every method of a program is “threadsafe”, whatever that means, then the entire program is “threadsafe”. I might not be entirely clear on what “threadsafe” means, but I do know one thing about it: thread safety is a property of entire programs, not of individual methods.

To illustrate why these sorts of questions are non-starters, today I present to you the world’s simplest deadlocking C# program:

class C
  static C() 
    // Let's run the initialization on another thread!
    var thread = new System.Threading.Thread(Initialize);
  static void Initialize() { }
  static void Main() { }

(Thanks to my Roslyn compiler team colleague Neal Gafter for this example, which was adapted from his book Java Puzzlers.)

At first glance clearly ever method of this incredibly simple program is “threadsafe”. There is only a single variable anywhere in the program; it is local, is written once, is written before it is read, is read from the same thread it was written on, and is guaranteed to be atomic. There are apparently no locks anywhere in the program, and so there are no lock ordering inversions. Two of the three methods are empty. And yet this program deadlocks with 100% certainty; the program “globally” is clearly not threadsafe, despite all those nice “local” properties. You can build a hollow house out of solid bricks; so too you can build a deadlocking program out of threadsafe methods.

The reason why this deadlocks is a consequence of the rules for static constructors in C#; the important rule is that a static constructor runs exactly zero or one times, and runs before a static method call or instance creation in its type. Therefore the static constructor of C must run to completion before Main starts. The CLR notes that C‘s static constructor is “in flight” on the main thread and calls it. The static constructor then starts up a new thread. When that thread starts, the CLR sees that a static method is about to be called on a type whose static constructor is “in flight” another thread. It immediately blocks the new thread so that the Initialize method will not start until the main thread finishes running the class constructor. The main thread blocks itself waiting for the new thread to complete, and now we have two threads each waiting for the other to complete.

Next time on FAIC: We’re opening up the new Coverity office in Seattle! After which, we’ll take a closer look at the uses and abuses of the static constructor.


39 thoughts on “The no-lock deadlock

  1. I think your example is the not the greatest. There is clearly a lock somewhere in the program: in the runtime. So not all methods are threadsafe, just the ones in the visible part of the program. Though I guess that’s the point you’re making 😉

    • The example is excellent. It means that though there is not lock anywhere in the code you can get your code out of sync (Initialize will finish before Join is called).

  2. Mmmm… I don’t like the example very much. While the main point of the article is obviously true, the example does not explain that really well, IMHO. (As much as it is _interesting_ and surprising!) The deadlock here is a result of outside-of-the-program behavior (sure, defined behavior of CLR is obviously “a part of the program”, but… you know what I mean). It’s like saying something like
    Main() { Thread.Sleep(Timeout.Infinite); }
    “magically” deadlocks, even though the method is “obviously thread-safe” (_no_ variables at all). While (in some sense) true, it (IMHO) does not point to the essence of the problem. Or something.

    • A deadlock involves two tasks waiting for each other, which isn’t the case in your program. It hangs, but it’s not deadlocked per se.

      • The program is deadlocked, since there is a newly created thread which is waiting for the static constructor to finish before it can proceed, while the thread with the static constructor is blocked waiting for the new thread to finish. That having been said, I would regard the “thread.Join()” as being a rather overt blocking statement; a more interesting example might have been to use two static classes, since such a program could deadlock without any visible blocking statements.

        Eric Lippert: Out of curiosity, since the locking behavior for static class initialization is not needed once the static constructor has finished, do you know if the vtable gets patched at that point to bypass any blocking primitives?

        • I am not an expert on the inner workings of the CLR, but I don’t think you’d have to patch the vtable. If a vtable exists and can be used then an instance must exist, and if an instance exists then the static initializer has already been executed.

        • With relaxed type construction (with beforefieldinit), the JIT will ensure the static constructor is called before generating any native code using the type, so there’s never any checks pertaining to the static constructor in the generated code.

          With strict construction, the JIT checks to see if the static constructor has already been called, and will emit code to do the check immediately before the type is accessed. If the static constructor has already been called when a method is being JITted, it just emits code without the check. Unfortunately, because .NET doesn’t re-JIT methods, that means that the first method JITted that accesses a type with a static constructor will always have the overhead of performing the check. The overhead isn’t that bad though, since I believe the CLR uses a check-lock-recheck pattern, so once the constructor has been called and once the caches are coherent, there’s never a possibility of blocking again.

  3. The MSDN documentation probably isn’t helping much with such confusion. The doc pages for a large number of classes state “Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe” in “Thread Safety” section. Corollary to what you said, thread safety is not a property of types.

    For the life of me, I can’t discern what is being communicated there. I actually took that from the documentation page for System.Random, and static instances of those are definitely not safe to share across threads.

    • There is no such thing as “static instance”. You can only have a static field pointing to an instance.

      `System.Random` has no static members(unless you count constructors or those inherited from `System.Object`), so there is nothing thread-safe in that class.

    • Many people find that boilerplate text confusing. They *think* it’s a comment about members that *they* define that are “of that type”, whereas I believe that it’s a statement about the members *belonging* to the type. I.e. It’s saying that (if it has any) any static members belonging to System.Random may be safely accessed from multiple threads. It’s saying nothing about any field or property that you create inside your own classes.

      It all boils down to “of this type” being interpretable in two different manners.

      • W’s comment made me realize exactly what you said. It’s not particularly clear what “public static members of this type” means. That is further exacerbated by applying that text to types which define no static members outside of those inherited from base types. I’m also not sure why the text appears in the documentation for the type instead of the documentation for the member. It further seems that the statement can only apply to the member access itself, but say nothing of the returned result of such an access. I could see that being a further source of confusion.

  4. Metacomment, it looks like CSS received a minor hosing. The comments are bleeding over into the sidebar. A brief inspection in chrome shows that the commentlist class has a width of 120%. Reducing it to 100 seems to correct it.

    • I know; people were complaining that the comments were too narrow when set to 100%. I haven’t figured out all the CSS tricks yet to make them look good with this theme. I am a WordPress newb.

  5. It is a bit disingenuous to call this a “no-lock” deadlock. The deadlock is due to the run-time’s type initializer lock. Which is a lock, albeit an implicit one. By definition, one cannot have a deadlock without a blocking operation involving multiple threads, after all.

    • Sure, it *is* a lock, but why does it *need* to be a lock? That’s an implementation detail. One could design a runtime in which a thread that was blocked waiting for a static constructor was then scheduled to service other work items until the static constructor completed asynchronously.

      Moreover, the notion that you need to have two threads “by definition” in order to have a deadlock seems unwarranted as well. Suppose you were given two tasks by your manager, and each task depended on the successful completion of the other, and your remaining work depended upon the completion of both. You’re telling me that you’re not deadlocked, just because there isn’t a second employee whose work you’re depending on?

      Many people think of threads as units of work, but they are not. Threads are *workers*. Most of the problems you see in multithreaded systems have analogous problems in single-threaded systems. People are just not yet in the habit of mentally separating workers from work.

      • I think the reason “deadlock” is often taken to imply the existence of two workers is that it’s usually restricted to cases where someone is waiting for something that *they reasonably expect will happen*, but which can’t happen while they’re waiting for it. One could have a deadlock with a single worker, but only if the worker in question either “believed” in the existence of other independent workers or entities that could cause the awaited condition to occur, or else kept bouncing from task to task without noticing that nothing was actually getting accomplished.

        • Odd…I am on the same version of Chrome, and I’ve got purple….

          Perhaps you have a stale copy of the CSS in your cache?

          • Curious. I took a screenshot of the page and pasted into Paint.NET. If I crop out just the text (text on white, no grays), I can clearly see that the text is purple. But within the context of the whole page, I cannot.

            Maybe I have some kind of color blindness?

          • …It’s not just context.

            If I take a screenshot, I can clearly see purple when that screenshot is displayed in Paint.NET, or I save it as an image and view it in Windows image viewer.

            But displayed in Chrome, I cannot see purple, only black.

            This is not possible. The pixels have the same values!

            Just weird.

  6. Your description of the static constructor rules is not entirely complete. There is an additional rule that says that static constructors by themselves never dead lock, so it is in fact possible to run a static method before the static constructor has run to completion.
    This is in contrast with Java, where they can dead lock.

  7. Shame I didn’t take many notes while developing my parallel game engine. Ran with a “lock-less” approach to maximize concurrency. Had so many deadlocks occur at various points of development that never really made any sense even after digging into the IL and reflected code from .Net.

    • You should go state-less to be lock-less, you don’t need to lock constants, immutable state are basically data-race free. There’s an entire academic discipline dedicated to creating programs that way, there are even languages for that, like F#. Also, if everything is constant and can only be changed by return new values, you can get more concurrency.

  8. Hmm, I (personally) prefer to use “thread-safe code” for “whatever threading shenanigans happen, this code produces consistent result”. So this program *is* threadsafe under this definition: it always deadlocks. Obviously, it’s not what it was *supposed* to do, but that’s another topic.

  9. Like a lot of the other comments, I don’t like your example. But I’m going to go a step further, and say your whole idea is wrong.

    First of all, a static constructor is a static constructor, not a method. It has special rules that don’t apply to methods.

    Second, the thread safety of a method depends on the thread safety of the methods it calls. Creating, starting and joining a thread are all obvious places where threading errors could happen. “Join” in particular puts the current thread to sleep until some other thread finishes, and should be treated with almost as much caution as lock.

    I think the rule that you attempt to disprove is actually true. If your method reads and writes only local variables, and only calls other methods that follow these rules, then it is thread safe.

    But thread.join() reads global state, the state of another thread.

    But you could also deadlock or livelock yourself in a single threaded application. Imagine a program that reads a byte from a file, and if that byte isn’t what it’s expecting, rereads that byte until it is.

    But the contents of that file count as a global variable, and so the method I described would not fit into the “automatically thread safe” category.

  10. Pingback: The Daily Six Pack: February 6, 2013 | Dirk Strauss

  11. Here’s another way to cause the same problem that’s less obvious, and quite possible to do accidentally:

    class MainClass {
    public static void Main () {}
    static MainClass () {
    System.Threading.Tasks.Task.Factory.StartNew (() => 0).ContinueWith (t => t.Result).Wait ();

  12. Pingback: Insights on Passing by Reference in C# – Insights on a complex world

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s