In the previous exciting episode I ended on a cliffhanger; why did I put a loop around each wait? In the consumer, for example, I said:
while (myQueue.IsEmpty) Monitor.Wait(myLock);
It seems like I could replace that “while” with an “if”. Let’s consider some scenarios. I’ll consider just the scenario for the loop in the consumer, but of course similar scenarios apply mutatis mutandis for the producer.
Scenario one: Everything is awesome, everything is cool when you’re part of a team. Suppose the consumer is moved from its wait state to the ready state because the producer has put something on the queue. Now the queue is definitely no longer empty, and we are ready to enter the monitor again. Suppose we fail to do so right away due to a race with the producer. The producer might enter the monitor again and put more stuff on the queue, but eventually the queue will fill up, the producer will put itself into the wait state, and then the consumer is then the only thread left attempting to get into the monitor. Success is guaranteed, and there seems to be no need to check to see if the queue is empty; if we managed to re-enter the monitor it was because something was put on the queue. The loop is unnecessary.
Scenario two: Some other thread got ahold of
myLock and for reasons of its own decided to pulse the monitor. That thread is not the producer, so it did not ensure that the queue was non-empty. The consumer must be defensive and say “re-entering the monitor is not a guarantee that my desired condition was met, therefore I must check again.” If it is by design that a third thread can pulse the monitor then there needs to be a loop; if it is not by design then the existence of such a third thread is a bug in the program. If we can assume that there is no such third thread then we don’t need a loop.
Scenario three: The producer genuinely did put something on the queue, and at some time after that, the consumer re-entered the monitor. But between those two events, a third thread won the race and correctly removed the item from the queue for reasons of its own. Again, if that’s a by-design scenario then the consumer has to be willing to check the condition again. If it’s not a by-design scenario then there’s no need for a loop.
So let’s suppose there are only two threads, guaranteed, producer and consumer, that access this lock object and party on this queue. Our second and third scenarios do not apply, so the loop is unnecessary, right? Unfortunately there is a fourth scenario:
Scenario four: Everything is terrible! One time in a hundred billion runs a waiting thread wakes up and goes to the ready state even if it was never pulsed. Suppose we have no loop, and this rare event happens. A possible ordering of events is:
- The consumer enters the monitor, checks the queue, it is empty, it puts itself to bed.
- While the producer is running around looking for work, not touching the queue, the consumer thread spuriously wakes up, re-enters the monitor, and without looping, continues running, assuming the queue is non-empty. The queue code produces an unhandled exception and the consumer thread dies a horrible death.
In a world where spurious wakeups are a possibility, you have to always check your conditions in a loop. See, the loop mitigates the terrible scenario; if a thread wakes up spuriously then it checks its condition again, and goes back to sleep if it is not met.
Are spurious wakeups a possibility in C#? This is a surprisingly hard question. Let me list some facts.
Fact one: Spurious wakeups are known to be a rare but observable possibility when using condition variables (a locking mechanism very similar to what we’ve been discussing in this series) on operating systems that use POSIX threads. In particular, on linux when a process is signaled there is a race condition. The choices faced by the designers of linux were, I gather, (1) allow the race to cause spurious wakeups, (2) allow the race to cause some wakeups to be lost; clearly unacceptable, the consumer would never come back and eventually the queue would fill up, or (3) create an implementation with unacceptably high performance costs.
Fact two: Spurious wakeups are similarly documented as being a problem with Windows condition variables. “Condition variables are subject to spurious wakeups […] you should recheck a predicate (typically in a while loop) after a sleep operation returns.”
Fact three: The Java documentation states
“A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it […] waits should always occur in loops.”
Apparently the designers of Java explicitly endorse the theory that spurious wakeups are a real thing.
Fact four: Joe Duffy notes in “Concurrent Programming on Windows” that the claim that Windows suffers from spurious wakeups is somewhat histrionic:
“[…] threads must be resilient to something called spurious wake-ups […] This is not because the implementation will actually do such things […] but rather due to the fact that there is no guarantee around when a thread that has been awakened will become scheduled. Condition variables are not fair. It’s possible – and even likely – that another thread will acquire the associated lock and make the condition false again before the awakened thread has a chance to reacquire the lock and return to the critical region.”
Basically, Joe is saying here that in many situations our “scenario three” is likely.
Fact five: The documentation for
Monitor.Wait() says nothing about spurious wakeups or always waiting in a loop.
Fact six: Apparently the CLR does not actually use condition variables as its mechanism for implementing monitors, and therefore reasoning from the shortcomings of condition variables to the shortcomings of C# locks is poor reasoning. We really ought to examine the mechanisms the CLR actually uses if we want to know if they are subject to this problem. No, I’m not going to; see Stephen Cleary’s comment below for some links.
Fact seven: Many expert C# programmers like Jon Skeet (UPDATE: see comments!) and Joseph Albahari recommend always waiting in a loop. And some static analyzers look for missing loops around waits and flag them as a bad code smell; using a loop is a cheap and safe way to make such analyzers stop complaining.
Spurious wakeups in C# seem to be somewhat mythical beasts; people are afraid of them without ever having encountered one in the wild.
So what would I do here?
Well, the first thing I would do is of course not write programs that shared memory across threads! It’s a terrible thing to do! Look at me; I’m a pretty smart guy and I cannot tell you whether to write
while without writing a seven-item list of pros and cons that thoroughly contradicts itself, makes false analogies and rests upon appeals to authority and the absence of warnings in documentation! This would be a pretty weak foundation upon which to base a coding decision that has real consequences.
If I had to write a program that shared memory across threads then I would use the highest level tool in my toolbox. I would use a thread safe collection written by experts in this case. (Of course that simply begs the question; the expert must know how to do so safely using lower-level mechanisms! I presume they know better than I do.) If for some reason that was unavailable then I would use a higher-level construct for signaling, like an auto reset event, or a reader-writer lock, or whatever.
Were I forced to write code like this that uses monitors at a low level then I would grit my teeth, embrace cargo-cultism, put a banana in my ear, and write the loop even without being able to give a solid justification for why doing so keeps the alligators away.