mikeash.com: just this guy, you know?

Posted at 2009-01-09 21:38 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-01-16
Previous article: Friday Q&A 2009-01-02
Tags: cocoa fridayqna threading
Friday Q&A 2009-01-09
by Mike Ash  

Greetings one and all. I caught my mistaken writing of "2008" in this blog post title almost instantly instead of only noticing after I'd already posted it like I did last week, so the year must be coming along. Welcome to the second Friday Q&A of 2009 (and only the fourth in all human history!) where I'll be taking Ed Wynne's suggestion and talking about the various meanings and implications of thread safety as they apply to Mac OS X system frameworks.

Thread Safety? What's That?
Hopefully not too many readers are actually asking the above questions, but just as a quick refresher, thread safety is about whether it's safe to access a particular module, API, or data structure from multiple threads. These things are typically unsafe due to making assumptions of single-threadedness, such as updating multiple pieces of data in a non-atomic fashion, in such a way as to expose inconsistent data to the outside world. There's the classic example:

    x++;
Which is not thread safe (assuming x is globally accessible) because down at the very bottom it breaks down into multiple operations:
    get x
    increment
    store x
And if multiple threads are doing this at once, they interleave and you miss increments. Not too dire here, but apply it to pointers and objects and you can hopefully see why you'll crash at best, and silently corrupt data if you're unlucky.

So to start off with there are two kinds of thread safety in the world:

  1. Not thread safe. The normal state. Code is not thread safe by default. Special effort needs to be taken to make it thread safe, and if you haven't done it, your code falls into this category.
  2. Thread safe. Can be called from any thread without a care or worry. Nice to have, often painful to make.
Three Kinds
But what does this really mean? Well, thread safe is easy enough to understand. But not thread safe can't really mean it can't be called from any thread, because all code runs from some thread.

Of course what it really means is that this code can't be run from more than one thread at the same time.

But that doesn't really do it either. For example, NSMutableArray is not thread safe. But you can call NSMutableArray from multiple threads simultaneously, as long as each thread is working on a different array. So maybe we should say that thread unsafe means that the code can't be run on the same data from more than one thread at the same time.

Well, that's better, but still not there. Take the atoi() function. Not thread safe, says so in the man page. But you only ever feed it constant data, and it's unsafe even if you feed it completely different data on your different threads. What's the deal? Simple: behind the scenes, it has some shared data.

How can you tell the one from the other? We'll need another classification:

  1. Never thread safe. The normal state. Code is not thread safe by default. Special effort needs to be taken to make it thread safe, and if you haven't done it, your code falls into this category.
  2. Not thread safe with shared data. Can safely be called from multiple threads simultaneously as long as each thread is dealing with a distinct set of data.
  3. Thread safe. Can be called from any thread without a care or worry. Nice to have, often painful to make.

It's actually really easy to write code that falls into category #2. All you have to do is not have any global state, which is pretty common anyway. If you're writing an array class, your method for adding a new object to the array isn't going to deal with global state, it's going to deal with that one array. So while #1 may be the "normal state", #2 is actually really easy to come by, and most code falls into that category.

The System Screws It All Up
These categories are sufficient in a relatively simplistic program which controls every action taking place and for which all the code is known. It gets more complicated when you start pulling in a ton of big, complex external frameworks such as AppKit and Foundation. Take NSView as an example. It can fall into category #1 or #3 depending on what you're doing with it. (Drawing is safe, creation/resizing/etc. is unsafe.) But that #1 is complicated by the fact that the shared global data which makes NSView unsafe can be accessed by code that isn't yours.

NSView isn't just unsafe from multiple threads, it's main thread only. This is because your NSView doesn't just belong to you, it belongs to the framework. And this means that you can't synchronize all accesses to it, because some of those accesses come from code that does not belong to you! Let's put this in its own paragraph, because it's important:

If an API is never thread safe and you do not absolutely control every access to this API, then you can only call it from the main thread.

And since virtually every system API is going to be, at least potentially, called by other system APIs, we can rewrite our three types of thread safety:

  1. Main thread only. The normal state. Code is not thread safe by default. Special effort needs to be taken to make it thread safe, and if you haven't done it, your code falls into this category.
  2. Not thread safe. Can safely be called from multiple threads simultaneously as long as each thread is dealing with a distinct set of data.
  3. Thread safe. Can be called from any thread without a care or worry. Nice to have, often painful to make.
Singletons
Keep in mind that singletons qualify as global shared data. This has an important impact on their thread safety. Practically speaking, it means that singletons provided by system frameworks only ever fall into category #1 or #3. Take NSFileManager as an example. It's listed as not being thread safe. What this really means is that [NSFileManager defaultManager] can only be safely used from the main thread, because you can't control what other code might access it. (On 10.5 and above you can alloc/init your own private instances which then fall into category #2.)

Terminology and the Apple Way
This is all fine and dandy except that Apple, in their infinite wisdom, does not always distinguish between main thread only and not thread safe. To make things worse, they even sometimes use the term thread safe to mean what we have defined here as not thread safe.

Let's take that second one first, because it's pretty weird. As a concrete example, look at the CFNetDiagnostics API. The documentation for this API is full of quotes like this:

This function is thread safe as long as another thread does not alter the same CFNetDiagnosticRef at the same time.

Huh??

So why is it labeled "thread safe"? What they're trying to convey here, through the fog of inadequate terminology, is that this API falls into category #2 and not category #1. In other words, you can use it from any thread as long as only one thread at a time is using this API on any given piece of data. This as opposed to an API which requires you to call it only from the main thread.

Other APIs are less explicit about it. The Search Kit reference simply states "Search Kit is thread-safe". And yet I'm pretty sure it's not. Again, it's trying to convey that Search Kit is in category #2 rather than category #1.

Why do they do this? Well, back in the day, on the classic Mac OS, nearly all code ran in what might be considered the "main thread" today. As a consequence, nearly every API required only calling it from there. Being able to run from multiple threads was novel and unusual and was worth documenting. Alas, not only does this no longer make sense on Mac OS X, but this sort of terminology abuse is actively destructive because it ends up making guarantees which aren't actually true.

As an example of the first, look at NSAppleScript. In the big master guide it's marked as being not thread safe. This is true! However what they don't tell you is that NSAppleScript can only be safely used from the main thread, due to AppleScript itself being a main thread only API. And yet it's right next to other classes such as NSMutableArray which are clearly category #2.

Figuring It Out
So we've established the three basic categories of thread safety, and we've established that Apple doesn't consistently distinguish between them in its documentation. So what do we do?

Fortunately it's usually possible to figure out the real story.

  1. Check the documentation. Not only the API documentation but also the big list. Is the API listed as being thread safe? Is it written in a relatively unambiguous way that makes it clear that this really is thread safe, and not the "thread safe" that means "not thread safe"? Fortunately for us, this abuse of the term "thread safe" is relatively rare and relatively obvious. It generally shows up in older APIs which have no reason to be thread safe in the first place. If after all this you have determined that your API is thread safe then you're done! If not go to the next step.
  2. When in doubt, assume it's unsafe. In the absence of an explicit guarantee of thread safety, consider the API to be unsafe. But what kind of unsafe? That's the tricky thing.
  3. Does it access shared global data? Singletons fall into this category, as do things like user interface elements. If the answer is yes, then it's category #1: main thread only.
  4. Does it potentially invoke other code which may not be thread safe? NSAppleScript falls into this category (scripting additions) as do things like user interface elements which may broadcast notifications when they're manipulated. Again, if the answer is yes, it's main thread only.
  5. If you got this far then it's probably category #2, thread unsafe, and usable on any thread as long as you synchronize calls on shared data.
  6. To verify, think about how simple or self-contained an API is. If it's pretty self-contained then it's very likely category #2. If it calls out to a zillion other things then it's very likely category #1. Therefore we can be pretty sure that NSMutableArray (self-contained) is merely not thread safe, whereas NSAppleScript (calls out to all sorts of other stuff, including arbitrary third-party components) needs to run on the main thread.
It would be good if Apple would properly distinguish between the different kinds of thread safety. However you can usually do a good job of figuring out any given API if you work it through.

Visit Again in 604,800 Seconds
That wraps up this week's Friday Q&A. Discuss thread safety in the comments, then come back in a few hundred thousand seconds for the next edition. As always, if there's a topic you would like to see discussed here, post your suggestion in the comments or e-mail them (names will be used unless you tell me not to).

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Note to graphics folks: OpenGL API calls (at least in all the incarnations that I've ever used it) can be used on either the main thread or the background thread, but once you set up an OpenGL context on that thread, you must call all future OpenGL functions which apply to that context from that thread. Under Mike's definition above, this is "not thread safe" (which the computer science guy in me certainly agrees with).

Google for "OpenGL threading" for more info.

I would call the OpenGL API's calls single-thread-safe meaning that all OpenGL calls are thread-safe from the thread that that OpenGL context was created on. Plus there are (safe) ways to migrate OpenGL contexts between threads. But once they're there they are again single-thread-safe.

It's important to realize that what's meant by "thread-safe" is really talking about how the routine accesses data.

Thread-safe data includes:

    Immutable variables (any scope)
        (Read-only data should be declared const)
    Non-shared data
        local variables
        method parameters
        return values
        locally allocated memory

Thread-safe code...
...must have exclusive access to any data it modifies
...may modify non-shared data
...may simultaneously access distinct data
...may require synchronization
...has no race conditions
...does not deadlock
...has no priority failures
...has no starvation failures
Joshua Bloch's excellent book, Effective Java Programming Language Guide defines Degrees of Thread Safety as:
    Thread-safe
    Immutable
    Thread-aware
    Conditionally thread-safe
    Thread-compatible (friendly)
    Thread-hostile (non-reentrant)

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.