Can I skip the lock when reading an integer?

Today, a question from a Coverity customer:

Here is a greatly simplified version of our code:

public class TestLock
{
  private object threadLock = new object();
  private int value = 0;
  public void Start()
  {
    lock (threadLock) { value = 100; }
  }
  public void Finish()
  {
    lock (threadLock)
    {
      if (value != 0 )
        value = 0;
    }
  }
  public void Increment()
  {
    lock (threadLock)
    {
      if (value != 0) 
        value += 10;
    }
  }
  public int GetValue()
  {
    return value; // No lock!
  }
}

The start, finish and increment operations are all either mutating or non-atomic operations, or both, so they are locked. But the get-value operation is guaranteed by the C# specification to be atomic, so we figure that we can safely remove the lock. Is this correct?

The short answer is: Please do not elide the lock regardless of its correctness. It’s dangerous and the benefit is absurdly tiny compared to the risk.

The slightly longer answer is: The fact that you’re asking the question in the first place means that you don’t have a strong enough understanding of the memory model of C# in order to correctly write low-lock code, so it should be avoided. In fact, I’m not a big fan of multithreaded code in the first place. If you can avoid writing multithreaded code, do so. If you cannot, then use the highest level of abstraction available to you: use tools like the Task Parallel Library to efficiently manage worker threads on your behalf. If you cannot do that, then at least avoid sharing memory across threads. If you cannot do that, then at least lock all the memory that you do share across threads. Writing low-lock code should be the last thing you consider doing, not the first.

The attentive reader will note that neither of those two “answers” actually answered the question. It’s time for the verbose answer!

First off, let’s talk about atomicity.

You are correct that the C# specification promises that reads and writes of 32 bit integer fields are atomic. That is, they cannot be “torn”. A write to a 64 bit integer variable can be treated as two independent writes to two adjacent 32 bit variables, and therefore two threads that are both trying to assign values to those two adjacent variables at the same time can race. It is possible to end up with a value that is half from the first thread and half from the second thread, which is very confusing. The 32, 16, and 8 bit integer types and the bool and char types are all guaranteed to be atomic for individual reads and writes, provided that the developer has not done something silly like force the integer to be misaligned.

You are also correct that more complex operations, such as test-and-set or increment are not atomic, and therefore need to be protected against races.

So the unlocked read will at least be atomic in the sense that we never get a “torn” value out. Now, if we had something like:

    lock (threadLock)
    {
        value += 5;
        DoSomething();
        value += 5;
    }

then the unlocked access is immediately suspicious because it can read a value after the first increment but before the second increment, and that might be unexpected. Let’s assume that you are not in that situation.

Now let’s talk about locks. Locks provide two guarantees in C#.

First, there is the obvious purpose of a lock. The locked object is a monitor; only one thread may have a given monitor locked at a time. A thread which attempts to take a monitor that has been acquired by another thread will block until it can obtain access.

There is a second guarantee of a lock that is more subtle. The C# compiler, JIT compiler and CPU are all permitted to make optimizations provided that the optimization is undetectable by the current thread. In particular, reads and writes of variables may be re-ordered or eliminated entirely provided that doing so is not observable to the current thread. For example, consider the program fragment:

someInteger = someOtherInteger;
someIntegerJustChanged = true;

Given that there is no code between the two statements there is no way for the thread that runs this code to detect what order the mutations happen in. It would be perfectly legal for these two mutations to be re-ordered if the CPU decided that it was more efficient to do so. The current thread cannot tell the difference, but another thread could observe the mutations to happen in the “wrong” order. (I note that on x86-based hardware this particular reordering of writes is actually never observed; those CPUs do not perform this optimization. There is no requirement that all C# programs be run only on x86-based hardware!)

A lock introduces a restriction on these optimizations. In particular: no read may be moved before the beginning of a lock statement, and no write may be moved after the end of a lock statement. Let’s compare these two pieces of code:

  public int GetValue()
  {
    return value; // No lock!
  }

vs

  public int GetValue()
  {
    lock(threadLock) { return value; }
  }

What’s the difference?

In the latter version the CPU may not move the read backwards past the beginning of the lock, and may not move any write that happened before the method was entered to a point after the end of the lock.

In the former, the CPU may as an optimization choose to move the read arbitrarily far “backwards in time” with respect to other reads and writes on this thread provided that doing so is not detectable by the current thread. But that fact could be observed by another thread, and since we know that this program definitely reads memory locations on multiple threads, I am worried. Perhaps some other thread is expecting that the read of this value will be observed to occur in a particular order with respect to the write of some other value. You said that this is a greatly simplified version of the real program, and therefore we cannot reason from the correctness of this program to the correctness of the original program! There are a number of scenarios:

Scenario one: The authors of the code do not know that their program is correct without the lock, and do not have a solid customer-impacting performance reason to elide the lock, but are doing so anyways just for the heck of it. Fortunately for them, by sheer luck their code does not actually have a customer-impacting bug, and equally fortunately, by sheer luck no future edit to the code will introduce such a bug. Also, no future update to the C# compiler, JIT compiler or CPU optimization will ever introduce a bug either.

Scenario two: The original program has a subtle bug, or a bug will be introduced in the future via a code change, compiler update, and so on. Some other thread assumes that the unlocked read will not be moved backwards in time with respect to some other read or write, but in fact this optimization is not only permitted, but will be performed. This fact becomes observable on a thread that then behaves incorrectly as a result. The code therefore has a bug that is difficult to detect and practically impossible to reproduce.

Scenario three: The authors of the original code, who we recall are asking the question about thread safety to begin with, are actually asking a rhetorical question. They fully understand all the ramifications of every possible CPU reordering and have analyzed all those possible reorderings to determine that there is never a bug in their program under any of those conditions. Moreover, they wish to avoid the roughly 10 or 20 nanosecond penalty of the uncontested lock for performance reasons, because that one-hundred-millionth of a second penalty both observable by and unacceptable to their users. And finally, the authors will carefully review every single change ever made to this program in the future with an eye to the possible bugs introduced by reordering optimizations. Avoiding that 10 ns penalty is worth the enormous cost in expert code reviews and the high risk of getting it wrong.

My guess is that the developers are 99% likely to be in scenario one, 0.99% likely to be in scenario two, and 0.01% likely to be in scenario three. I also hazard a guess that the developers have no solid customer-impacting performance reason to elide that lock. Is the expected benefit of a 10 ns savings that no customer will notice really worth the cost, that by chance some impossible-to-detect-and-reproduce bug is in this program? I doubt it. Therefore my recommendation would be to always lock every access to a field that can be shared across threads, even if you think that eliding the lock is 99.99% likely to be harmless. The nano-scale benefit is simply not worth the macro-scale costs of verifying that the elision is correct, and certainly not worth the risk that the code is incorrect now or might become incorrect in the future. Code which avoids locks is extremely difficult to reason about, so make your life easier: if you can, don’t write shared memory code to begin with. If you have to, use locks consistently.


UPDATE: This article was originally posted on the Coverity Dev Testing Blog and has since been deleted; a number of commenters on the original article asked if marking the field volatile magically prevents reordering bugs.

The specific problem with that proposal is that volatile reads and locks do not have the same semantics. A volatile read can be moved backwards in time with respect to a volatile write, and the x86 processor will actually do so, but a read, volatile or otherwise, cannot be moved backwards in time past the beginning of a lock.

The more general problem is: we have a toy example that is greatly simplified from the real code, and therefore we don’t know what invariants the real code relies upon. Trying to deducing whether a real gun is safe by examining a toy gun is a dangerous proposition.

In an upcoming article I will give an example where every variable is volatile, but eliding locks on some of them can produce an unexpected violation of what appears to be a program invariant. 

 

25 thoughts on “Can I skip the lock when reading an integer?

  1. I think the point you make that you should always take the safe route and use the lock is a good one. However, I think you go a bit overboard on the rhetoric implying that there is no possible performance reason for not using locks.

    Use of locks serialize access to a resource, so if you have some resource being used by multiple threads and there is a lot of lock contention then you’ll effectively get multi-threaded code that is providing no performance benefit over a single threaded version. I think this a much more likely reason that developers try to avoid using locks, not because they think the 10ns cost of taking an uncontested lock is significant.

    I think addressing that scenario in particular (contested locks) would be an interesting blog post.

    • I hear that response a lot, and my response to that is: if you have a problem with contention on a lock that reads a single field then you are highly likely to have some really bad architectural flaw in your application. Find that flaw and fix it!

      If the problem is “I’m spending too much time at stoplights” then the solution is not to start running red lights and hoping for the best; it’s to build a freeway that doesn’t have stoplights in the first place.

      • A reasonable reaction to “I’m spending too much time waiting at red stop signs” may be to look before before crossing the last road prior to an intersection to see whether there seems to be significant cross traffic, and to try an alternate route if the cross traffic looks excessive. Observations of cross-traffic while one is a block away will have both false positives and false negatives, but if one stops before actually proceeding into an intersection even if one didn’t notice cross traffic on the approach, false negatives won’t pose any danger. False positives and false negatives will degrade efficiency, but if both true positives and true negatives are sufficiently common, they may improve efficiency more than the false observations degrade it.

      • Using a monitor to protect a single field is truly the equivalent of using a cannon to kill a fly. Other lighter solutions exists that are more suited, but still most of this won’t actually matter as the lock time is very short (in fact in no contention mode there is no cond var used and simply Interlocked op is used).

        So if one wants something lighter then memory barriers can be used or simply an interlocked op, but this is much more complicated and error prone and if such performance is needed why this is done on a GC(stop the world) platform ? :).

  2. Actually there’s the not-particularly-rare example of a statistical counter – but there the advantage is removing the lock on the writer side, not the reader side (however, the example you posted the writers are already serialised so it won’t change anything).

    Also, how do you have 20ns locks when an atomic op takes 40?

    • I was not intending to give an exact cost, which depends on all kinds of factors and implementation details of the chip, operating system and runtime.

      My point is that the cost is low enough that in a world where our apps routinely download billion-byte files over transcontinental tables, a penalty on the order of magnitude of nanoseconds is not what stands between your application and success in the marketplace.

    • For statistical counters you can always use a volatile int, with Interlocked.Increment and/or Interlocked.Decrement to modify it. Reads of such a vloatile int are okay “as-is”, as long as you don’t make any executive decisions based on it.

    • That’s what pipelining is for. It goes down to the circuit level. If your logic gates are arranged just right, you can begin feeding the next op in before the previous op finishes.

      I have an adder in my personal Minecraft server which I can pump pairs of numbers into it at 2 tick intervals but it takes 6 ticks to complete any given one of them. The inputs are buffered and the outputs cached. The pipeline tieing it into everything else is build with this in mind for every component it interfaces with.

    • I think “usually unexpected” isn’t an oxymoron because then “usually” is describing the frequency at which it is unexpected, as in it is some times expected and some times unexpected. There is no contradiction (to me at least) between the description of frequency and its unexpectedness.

      In the context itself, “usually” is describing the frequency when it is *the case* of being unexpectedly complicated, so in that case (heh) the two are not even directly connected.

      But I think that’s enough grammar nazism for today.

  3. I have another question regarding guarantees, if a method contains a lock, may I assume no value read before the method call will be cached and assume no access will be reordered across that method call?

    In a pratical example, does ReaderWriterLockSlim have the same guarantees as the lock statement?

    If “yes” does those rules are valid for every method call or the compiler/runtime “mark” methods with locks?

    • The only guarantee you get about the lock is the one I said: reads cannot move backwards in time from any point after the lock is entered to before the lock is entered. Writes cannot move forwards in time from any point before the lock is exited to any point after the lock is exited. That’s the guarantee you get.

      ReaderWriteLockSlim does have the same effect: a read is not moved backwards in time to before the lock being entered, and a write does not move forward in time to after the lock is exited. This is not guaranteed by the C# language of course, but it would be crazy for the implementers of the lock to not have the same semantics.

      These semantics are provided by the implementation of the method.

      • Thanks, I made a small test program:

            class Program
            {
                static void Main(string[] args)
                {
                    int x = 0;
                    int? y = 0;
                    int z;
                    ThreadPool.QueueUserWorkItem(o => { Thread.Sleep(1000); x = 1; });
        
                    while (x == 0)
                        z = y.GetValueOrDefault();
                }
            }
        

        Compiling with “Release” and executing without debuging it never ends, infinite loop, but, if I replace GetValueOrDefault() with Value than it exits after the 1 second wait.

        About current implementation is correct to assume it will:
        1) Inline the GetValueOrDefault() method call;
        2) Not inline the Value property;
        3) Not cache values over non-inlined method calls;
        4) None of above is actually guaranted by the specification.
        ?

        ps: I am not a native English speaker and not very communicative even in my native language.

        • You can’t make those assumptions. The relevant part of Eric’s answer is regarding reads/writes being moved around as long as they appear to behave the same from the perspective of a single thread.

          The reason you are potentially seeing an infinite loop is that your loop MIGHT be optimized into something akin to a “while(true)” because, as far as a single thread can tell, that value is never directly changed and so using the equivalent to “while(0 == 0)” would appear to behave the same regardless of whether it was cached or retrieved fresh every time.

          Even if that variable is changed by another thread, the original thread is free to continue referring to a cached value instead of going back and getting the ‘real’ value.

        • First of all: the behaviour of looping forever is permitted because as I said before, the optimizer can choose to re-order or eliminate reads and writes if doing so cannot make a difference in a single-threaded program, which is clearly the case here.

          To address your specific questions: it is in theory never valid to assume that the optimizer will or will not do any particular thing; that’s an implementation detail of the runtime subject to change at any time.

          In practice, the answers to your questions are:

          1) z = y.GetValueOrDefault() is equivalent to z = y.hasValue ? y.value : 0 which is simple enough to inline.

          2) z = y.Value is equivalent to if (!y.hasValue) throw ...; z = y.value. Because there is a throw in there, the jitter is far less likely to try to intern it.

          3) This is a good guess; I don’t know the answer one way or the other.

          4) None of this is guaranteed in any way. This is all implementation detail, as I said.

          PS: Your English is fine!

          • Thanks Chris Hannon And Eric Lippert for answers, don’t worry, I am not going to make a nuclear power plant control software based on those assumptions, I just wanted to learn more about how the current implementation of CLR actually works.

  4. There are some contexts in which reading a field without a lock may be somewhat analogous to the IsAlive property of a WeakReference, which can’t really say that a reference is alive, but it might be able to say if it’s dead. If a field reports that data “probably” isn’t ready, and if code can do something more useful than wait for the data to become ready, checking the field without the lock may be helpful, especially if the lock might be held for awhile by whatever is trying to get the data.

  5. Some additional info regarding C# memory model and optimization can be here: http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/ (does this guy still work for Microsoft? If yes, why no more recent blog posts? But that’s BTW).
    Also, is ‘usually unexpected’ an oxymoron? Well, I’m not much of a linguist, but I’d say that if it can be substituted by ‘normally not expected’ or somesuch, then it should be Ok . 🙂

    • And in the article, it’s not ‘usually unexpected’, but rather, ‘usually complicated’. That’s definitely fine, even if the complications are unexpected! 🙂

  6. Pingback: Dew Drop – March 13, 2014 (#1742) | Morning Dew

  7. Hi there this is kinda of off topic but I was
    wondering if blogs use WYSIWYG editors or if you have to manually code with HTML.

    I’m starting a blog soon but have no coding know-how so I wanted to
    get advice from someone with experience. Any help would be enormously appreciated!

  8. Pingback: Reordering optimizations | Fabulous adventures in coding

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s