ATBG: How does a lock work?

Posted on February 12, 2014 by ericlippert

Today on the Coverity Development Testing Blog‘s continuing series Ask The Bug Guys, I answer a question I’ve gotten a fair number of times recently: how is it possible to implement locking without already having locking?

The answer I give is chock full of lies, misinformation and oversimplification, but hopefully it will get across the basic concept that you can build all the threading mechanisms out of simple parts like atomic-test-and-set.

In related news, here are a couple of other recent posts to ATBG. First, my colleague Tim explains that Java’s hashCode has many of the same design guidelines as the equivalent method in C# that I discussed here. And my colleague Jon discusses the perils of accidentally performing overlapping reads and writes to the same buffer in C. [1. My favourite quotation on the subject of poor handling for overlapping buffers has to be “there has yet to be an example of an application whose impressive performance can be credited to the absence of overlap-detection code in memcpy“. True that! I note also that the C# equivalent Array.Copy method is documented as dealing correctly with overlapping reads and writes to the same array.]

As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

23 thoughts on “ATBG: How does a lock work?”

John Payson on February 12, 2014 at 1:59 pm said:

I like the article linked in your footnote, though I’m not sure I quite agree with the quote itself. On many platforms, memcpy has historically been implemented as a compiler intrinsic, while memmove was a function call. On platforms which do a good job of optimizing memcpy operations, it may perform more than twice as fast as memmove–a difference which can be significant when it’s used within an inner loop.

Still, the general point is a good one. A certain level of complexity will be required in any system; failing to include a small amount of complexity in a place where could most easily be accommodated will often force other parts of the system to add much more complexity dealing with it. Part of the art of good system design is recognizing when a small amount of complexity in one case can eliminate a large amount elsewhere.

Reply ↓
Bartosz Adamczewski on February 12, 2014 at 2:34 pm said:

Since monitor at it simplest form (where there are no other waiters) simply executes Compare and Set and only where there are multiple threads waiting on the monitor it moves to contention resolution by creating a monitor wait event that he constantly blocks on tries to down-set the waiters count and signal the next waiter and so on.

So there is a considerable chance that the simple monitor implementation might just be better in some cases or the performance might even out especially in fast short code that will not be very likely to have contention. But without the wait free guarantees or proper back off the lock free monitor will not scale well compared to the standard monitor.

Since the article is good at it’s core (and misleading) wouldn’t it be better to expand it further and work out all of the details ? This would give others a much better understanding on how to build a proper CAS lock etc or a proper Monitor implementation (In case something is missing in the current one).

Thanks 🙂

Reply ↓
Pingback: Dew Drop – February 13, 2014 (#1722) | Morning Dew
voo on February 13, 2014 at 7:26 am said:

I guess presenting Dekker’s algorithm as one possibility for locking that doesn’t have to rely on CAS (or ll/sc) would have been out of scope.

I’m not sure how I feel about such overview articles though – they don’t help people write better software and can mislead people into a false sense of security (“oh that’s not so complicated after all”). E.g. no mention of the given ordering guarantees of locks – so better hope nobody uses a weak CAS to implement their own version.

Reply ↓
- John Payson on February 13, 2014 at 9:04 am said:
  
  It is impossible to implement locking via any algorithm without some sort of sequenced atomic primitives. On many historical single-processor machines, and even some multi-processor ones, the reads and writes of word-sized variables were sequenced and atomic (so that if one thread writes x and y, in that order, and another thread reads them in the opposite order, the latter thread can guarantee that if it sees the new x it will also see the new y, and if it saw the old y it will also have seen the old x. Even on a machine with no fancier atomic primitives, sequenced atomic reads and writes are sufficient to implement a mutex. Given that many machines which lacked fancier primitives still had sequenced atomic reads and writes, mutex implementations for such machines were useful.
  
  Today’s general-purpose processors invariably include atomic primitives which are more powerful than sequenced atomic reads and writes. Further, unlike historical machines where an atomic sequenced read or write would cost the same as any other “real” access to a variable, today’s multi-core machines generally have a heavy execution-time penalty for all forms of sequenced atomic access. If multiple atomic accesses can be replaced with a single compare-and-swap, that’s a major performance win. While Dekker’s algorithm was useful on hardware platforms where the best atomic algorithms were sequenced atomic loads and stores, it is not appropriate for use on modern machines.
  
  Reply ↓
  - voo on February 13, 2014 at 11:00 am said:
    
    I’m not sure where you disagree with anything I said? Obviously nobody would use Dekker’s algorithm to implement locks these days, but the only requirement it has is that you have memory orderings or guaranteed sequential consistence to begin with (both of which have nothing to do with atomic instructions themselves).
    
    And since you only have to read/write a single bit for Dekker it’s rather unimportant whether reads/writes of words or any other unit are atomic or not (you can’t read only half of a bit after all).
    
    So for me that counts as being free of atomic instructions, even though that’s a much stronger statement than “doesn’t need CAS or ll/sc, etc”.
    
    Reply ↓
  - John Payson on February 14, 2014 at 8:34 pm said:
    
    I drifted a bit from what I’d intended to say, which was to agree that Dekker’s algorithm would have been out of scope; it’s an interesting historical curiosity, but doesn’t fit well with modern memory models. Dekker’s algorithm may only look like it uses three bits, but on some machines it will actually use six: processor A’s “flag0”, “flag1”, and “turn”, and processor B’s “flag0”, “flag1”, and “turn”. When processor A executes `flag0=true`, that statement will set processor A’s `flag0`, and it will set processor B’s `flag0`, but on many machines, other actions might occur between the writes to flag0. Whether one wishes to describe the statement as writing two separate variables, or whether one wishes to describe sequential inconsistency via some other way, my point is that on many machines the only way to make Dekker’s algorithm work would be to impose memory barriers that would cost more than just using CompareExchange or load-linked-store-conditional.
    
    Reply ↓
- Eric Lippert on February 13, 2014 at 11:12 am said:
  
  I did mention ordering requirements; I called out specifically that they were out of scope for the purposes of that article.
  
  I also mentioned that solving this problem on architectures that lack interlocked compare and swap is a tricky but solved problem.
  
  My intention in this article was neither to give a history of concurrency primitives nor to describe how to write a monitor with all the features you’d need for a real-world implementation. Rather, I called out that my intention was to show that it is possible to build a working lock out of parts that are far simpler than a lock. Many programmers who don’t have a background in low-level OS programming are curious as to how it’s possible to solve the apparent chicken-and-egg problem of building a lock without already having a lock.
  
  Reply ↓
- Joshua on February 13, 2014 at 1:38 pm said:
  
  I’m not sure how I feel about such overview articles though – they don’t help people write better software and can mislead people into a false sense of security (“oh that’s not so complicated after all”).
  
  I disagree, myself. Firstly, for someone who does want to get a good understanding of how real locks work, they have to start somewhere. The first introduction to a complex topic necessarily has to be an overview. One of my favourite computer science lecturers at university was blunt on that point – “Teaching is a process of lying to you.”
  
  Secondly, overview articles like this can help ordinary programmers who don’t want to specialise in the esoteric area. It can demystify the area and reveal that it’s all just code, or hardware in this case, at the end of the day and not magic. Some guy once wrote a blog post about the unhelpfulness of magic thinking in software.
  
  Lastly, I think this particular article was extremely clear that what was being presented is a simplification of the topic, so it’s unlikely to provoke an “oh that’s not so complicated after all” reaction from anyone.
  
  Reply ↓
Joker_vD on February 15, 2014 at 12:45 pm said:

Article “Locks and exceptions don’t mix” says (about pre-C# 4.0 lock statement) that

“The problem here is that if the compiler generates a no-op instruction between the monitor enter and the try-protected region then it is possible for the runtime to throw a thread abort exception after the monitor enter but before the try. In that scenario, the finally never runs so the lock leaks, probably eventually deadlocking the program. It would be nice if this were impossible in unoptimized and optimized builds.”

which naturally raises a question about thread abortion—does the runtime “inject” a thread abort exception into the running thread instead of using TerminateThread?

Reply ↓
Chris Venus on February 17, 2014 at 2:29 am said:

How complex is the “real” locking story? I thought that ATBG was very interesting but I’m also curious about how the real code works with all those bits you deemed out of scope (eg ordering, how it actually waits, etc.). Is it the sort of thing you could do an article on?

Reply ↓
- Bartosz Adamczewski on February 17, 2014 at 2:41 am said:
  
  If you want pure CLR implementation then go to sscli under syncblk.cpp you will find the .NET 2.0 lock implementation there. If you want to know how monitors are implemented in general then you will find the implementations in linux kernel along with implementation of mutex/futex and cond var (you need to know them to fully understand how lock works).
  
  So here are some links for you:
  
  http://locklessinc.com/articles/mutex_cv_futex/
  http://lxr.free-electrons.com/source/arch/sparc/include/asm/cmpxchg_64.h?a=m68k#L9
  http://lxr.free-electrons.com/source/kernel/locking/mutex.c
  
  In therms of articles there is nothing that I can recommend .NET wise. I’m myself are in the process of writing such article but it will not be completed any time soon.
  
  Reply ↓
- Chris B on February 17, 2014 at 8:52 am said:
  
  If you are really interested in that topic (along with many others), I recommend Understanding the Linux Kernel (https://play.google.com/store/books/details?id=9yIEji1UheIC&source=productsearch&utm_source=HA_Desktop_US&utm_medium=SEM&utm_campaign=PLA&pcampaignid=MKTAD0930BO1).
  
  It shows how many such structures are implemented in Linux, and describes the implementations. It’s a dense read, and quite difficult at points, but very informative.
  
  Reply ↓
http://planningwithkids.com/2014/03/05/february-monthly-review-3-tips-for-greater-family-harmony/ on April 3, 2014 at 7:32 pm said:

each time i used to read smaller articles that also clear their motive, and that
is also happening with this paragraph which I am reading here.

Reply ↓
the Ten Minute Gardener on May 12, 2014 at 10:19 am said:

Pretty! This was an incredibly wonderful post. Thanks for
providing these details.

Reply ↓
nfs cheats on May 26, 2014 at 5:08 pm said:

The back end wants to break out if you are not slow enough in the turn,
and while this is manageable, it is still another thing to deal with
while trying to compete with the other 15 cars on the track
with you. For the Linux user 3 – 6 months into their Linux learning
experience I wouldn’t hesitate to recommend Fedora Linux 10.

This can be compared to first learning to drive a car.

Reply ↓
basement renovation ideas toronto on May 29, 2014 at 1:22 am said:

Why users still use to read news papers when in this technological world everything is existing on net?

Reply ↓
mbt shoes on May 29, 2014 at 5:45 am said:

I don’t even know the way I stopped up right here, however I believed this publish used to
be good. I don’t recognize who you might be however
certainly you’re going to a well-known blogger for those who aren’t already.
Cheers!

Reply ↓
tonti srl rettifiche on July 5, 2014 at 4:08 pm said:

After checking out a number oof thhe articles on your web site, I truly appreciate your technique
of writing a blog. I bookmarked it to my bookmark website list and will be checking baxk in the
near future. Please visit my website as well and tell me your opinion.

Reply ↓
agencje interaktywne gdańsk on July 8, 2014 at 7:22 pm said:

? nowy Triple Threat nieprawdaż t tylko piłka nożna uczestnik który
jest wielki i rośnie , pracy i kopiąc

Reply ↓
IIYAMA napisz do nas on July 12, 2014 at 1:40 am said:

Greetings! I know this is kinda off topic but I
was wondering if you knew where I could find a captcha plugin for my comment form?
I’m using the same blog platform as yours and I’m having trouble finding one?
Thanks a lot!

Reply ↓
impression bache pvc grand format on March 10, 2016 at 3:47 am said:

nuclear familyrevolting disgustingThetwo exact twotimerevolting disgustingextended familytwo revolting
disgustingfornuclear familytwo twonuclear familyrevolting
disgustingdryingrevolting disgustingextended familytwo
twonuclear familyrevolting disgustingthetwofive
hundred five hundredtwonuclear familyrevolting disgustingtrayextended family tworevolting disgustingadhesivesrevolting disgustingextended family five hundrednuclear
familyearlier thanextended familytwofive hundred five hundredtworevolting disgustingimpressionrevolting disgustingextended
familytwo takingrevolting disgustingextended familyfive hundred can be twonuclear
familynotextended familyfive hundred recognized.

Reply ↓
vintage dress shop online australia on March 21, 2016 at 2:13 pm said:

Vous pouvez donc vous procurer nos trouvailles classic dans notre nouvelle boutique
directement à partir de ce web site.

Reply ↓