Next article: Friday Q&A 2009-07-17: Format Strings Tips and Tricks
Previous article: Friday Q&A 2009-07-03: Type Specifiers in C, Part 2
Tags: c fridayqna
Here at last is the conclusion to Friday Q&A's three-part series on C type specifiers. The first week I discussed const
and restrict
. Last week I discussed the basics of volatile
and why it's not very useful. This week I'm going to finish up by discussing the use of volatile
in a multithreaded context.
Synchronization
Shared data is a big part of what makes multithreaded code so difficult to write. In standard multithreaded code, you must synchronize access to any pieces of shared memory, usually by using a lock.
A key piece of advice: if you're using locks on all of your shared data, you do not need to use volatile
for anything.
This should give you a big clue as to when volatile
is useful.
Lockless Shared Data
What happens when you access shared data without a lock? Unless you're really careful, lots of bad things can happen. Unless your data access is carefully constructed, you can easily end up with inconsistent data. For example, imagine a simple shared counter built like so:
int gCounter;
void Increment(void) { gCounter++; }
int GetCurrent(void) { return gCounter; }
Increment
then this code is not safe! On most systems, gCounter++
will break down into multiple steps:
- load current value into register
- increment register
- store register into current value
And those individual steps can be interleaved, so you can end up losing increments due to the overlap.
Note that making gCounter
be volatile
does not help in any way.
(If you want to learn more about lockless thread safe data structures, you may want to listen to my podcast with the Mac Developer Network.)
The problems with shared data run deeper than simple interleaving of more basic steps. Let's say that you're allocating a data structure on a thread, and want to flag when it's ready:
struct SharedDataStructure gSharedStructure;
int gFlag;
gSharedStructure.foo = ...;
gSharedStructure.bar = ...;
gSharedStructure.baz = ...;
gFlag = 1;
if(gFlag)
UseSharedStructure(&gSharedStructure;);
Since gFlag
shares no dependencies with gSharedStructure
, the compiler is free to reorder all of those assignments. It could assign gFlag
first, then fill out the structure. Your other thread will then see the flag as set before the structure is initialized, leading to chaos.
Easy to fix! We'll just make gFlag
be volatile
. This will force the compiler to make the store happen right on that line. Not so fast! This doesn't fix the problem at all. Yes, it forces the compiler to make the store happen right on that line, but it doesn't force the compiler to do anything about the struct. The C language requires that stores to volatile
variables happen in order with respect to other volatile
accesses, but it does not require anything with respect to non-volatile
variables. Thus the compiler is free to reorder the gSharedStructure
stores as it wishes across the volatile
boundary.
Still easy to fix! Just make both of them be volatile
. This does indeed fix the problem with respect to the C compiler. The stores will be guaranteed to be generated in the proper order....
But no, this doesn't work, you're still doomed!
CPU Memory Reordering
It turns out that your CPU is playing its own version of the C "as if" game. Your CPU sees a list of instructions but is only required to execute them "as if" they occurred in the proper order. Internally, your CPU will aggressively re-order things to run faster. This could result in some loads or stores occurring in a different order from what the machine code indicates. Normally this is not a problem, because the CPU guarantees that the end result is still "as if" they all happened in the original order.
It becomes a problem when you have multiple CPUs sharing the same memory, as happens when you access shared data without locks on a multi-CPU system (which is most PC-class systems these days). Although everything happens "as if" it were in the original order on one CPU, another CPU will see the true out-of-order memory accesses. If your two threads are running on two different CPUs, then you still have the potential that the reader thread will see gFlag
as set and not see gSharedStructure
as initialized, even with volatile
on both of them.
Easy to fix! I'm actually serious this time. You just insert a call to the OSMemoryBarrier
function (from libkern/OSAtomic.h
) in between the two sections to enforce ordering at the hardware level:
gSharedStructure.foo = ...;
gSharedStructure.bar = ...;
gSharedStructure.baz = ...;
OSMemoryBarrier();
gFlag = 1;
if(gFlag) {
OSMemoryBarrier();
UseSharedStructure(&gSharedStructure;);
}
But there's a twist. At this point, you don't need the volatile
qualifiers anymore! At least, you probably don't....
As If!
Once again, the "as if" rule is key to understanding all of this. The compiler has to generate code that works "as if" everything happened as written. The trick is that the compiler can't see all of the code that's running in your process. Much of it is in system libraries. The moment that control jumps to code that the compiler can't see, it must make sure that all of its virtual ducks are in a row, and get all of those values to actually be in memory. For all the compiler knows, that external code may access your data, so it has to ensure that it's stored.
This means that volatile
is definitely not needed on gSharedStructure
. The call to OSMemoryBarrier
in the writer forces the compiler to commit all of those stores to memory before making the call, to ensure that OSMemoryBarrier
can see the correct values. (It won't look at them, but the compiler cannot know this.) Likewise, the call to OSMemoryBarrier
in the reader forces the compiler to re-fetch everything from memory, because for all it knows OSMemoryBarrier
could have modified the values.
What about gFlag
? This is a bit more complex. Here's an example where it would definitely need to be volatile
:
while(1) {
if(gFlag) {
OSMemoryBarrier();
UseSharedStructure(&gSharedStructure;);
}
}
gFlag
as long as it's false, because you never hit any external code. If gFlag
is not volatile
, the compiler is free to read gFlag
once, then use the cached value each time through the loop, so it would never see any change to it. Thus it must be made volatile
for this code to be correct.
Now here's an example where it has no need to be volatile
:
- (void)method {
if(gFlag) {
OSMemoryBarrier();
UseSharedStructure(&gSharedStructure;);
}
}
gFlag
every time.
(Note: this is largely true if this were a function instead of an Objective-C method, but can be false in the face of inlining and whole-program optimization such as is performed by gcc-llvm and clang. Be careful!)
Here is another example where volatile
is useful:
int gCount;
...
while(!done) {
work();
gCount++;
}
...
while(gCount < total)
;
gCount
is marked as volatile.
At this point it's important to note that even marking it as volatile
won't be enough if gCount
cannot be read or written atomically by your CPU. Whether this is true depends on your individual CPU. General guidelines are that the data must be aligned to a multiple of its size (true of a global, but not always true if you're doing skanky stuff) and that for integers it must be equal to or smaller than the CPU's native size. In other words, if you declared volatile int64_t gCount
on a 32-bit CPU this code would not necessarily work. Your program could see half-written values, with the top and bottom mismatched, which would not be a good thing.
Finally, you need to be careful with volatile
because it's a common place to find compiler bugs. The fact that volatile
is used so rarely and is in conflict with things like the compiler's optimizer makes it a good place for bugs to flourish. A survey of many popular compilers found that many of them miscompiled volatile
code a large percentage of the time.
Conclusion
To break down what we've learned:
volatile
is necessary when reading or writing a shared value in a loop whose body does not touch "foreign" code.volatile
is not sufficient when doing this on multiple pieces of dependent data when ordering is important. In order for this to work,OSMemoryBarrier
must be used. Since this is foreign code, this may remove the requirement to usevolatile
, depending on the exact structure of your code.volatile
does not help in a multithreading context with variables that cannot be atomically written or read by your CPU.volatile
is neither necessary nor helpful when working with complex shared data protected by locks or using atomic operations.- Be wary of using
volatile
even where it's perfectly correct, as you stand a decent chance of encountering a compiler bug which defeats your correct code.
volatile
can be occasionally useful for certain types of shared data access in a lockless context. However, when in doubt, use locks! Lockless shared data is extremely difficult to get right. I hope that this guide gives you some idea of how volatile
can help you get it right, and more importantly how it can't help you get it right, but unless you absolutely must not use locks in any way, it's much better to protect your shared data with a lock instead. (And if you can, avoid shared data altogether! Message passing is usually a much nicer way to do multithreading.)
That wraps up this week's Friday Q&A, and also wraps up the three-part series on type qualifiers. Come back next week for another exciting edition. As always, Friday Q&A is powered by your ideas, so please send them in or post them in the comments below.
Comments:
@Steve Madsen:
I suspect you might be right, the combination of new language syntax and a helpful runtime in GCD may have a profound influence on many programming tasks. There's a "Technology Brief" linked from the bottom right of the Grand Central marketing blurb here, if you hadn't seen it:
http://www.apple.com/macosx/technology/#grandcentral
gSharedStructure.foo = ...;
gSharedStructure.bar = ...;
gSharedStructure.baz = ...;
gFlag = 1;
would changing the gFlag variable to be part of the structure be enough to make gFlag have the "dependency" on the structure, and thus solve the problem?
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.
I don't have the Developer Preview, but I'd put money down that Grand Central in Snow Leopard is going to turn me into an old man ("In my day...").