Next article: Friday Q&A 2009-09-11: Intro to Grand Central Dispatch, Part III: Dispatch Sources
Previous article: Objective-C Runtime Podcast Episode at Mobile Orchard
Tags: fridayqna gcd performance
Welcome back to Friday Q&A. Last week I discussed the basics of Grand Central Dispatch, an exciting new technology in Snow Leopard. This week I'm going to dive deeper into GCD and look at how you can use GCD to take advantage of multi-core processors to speed up computation. This post assumes that you've read last week's edition, so be sure to do that if you haven't already.
Concepts
In order to take advantage of multiple CPU cores within a single process, it's necessary to use multiple threads. (I'm ignoring multi-process concurrency, because it's unrelated to GCD.) This is just as true in the GCD world as it is in the purely threaded world. At the low level, GCD global dispatch queues are just abstractions around a pool of worker threads. Blocks on those queues get dispatched onto the worker threads as they become available. Blocks submitted to custom queues end up going through global queues and into that same pool of worker threads. (Unless your custom queue is targeted at the main thread, but you would never do that for speed purposes!)
There are essentially two ways to extract multi-core performance out of GCD: by parallelizing a single task or a group of related tasks onto one of the global queues, and by parallelizing multiple unrelated or loosely related tasks onto multiple custom queues.
Global Queues
Imagine the following loop:
for(id obj in array)
[self doSomethingIntensiveWith:obj];
-doSomethingIntensiveWith:
is thread safe and can be run in parallel with other uses of that same method. If this is true, and array
regularly contains more than one object, it's easy to use GCD to parallelize this code:
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
for(id obj in array)
dispatch_async(queue, ^{
[self doSomethingIntensiveWith:obj];
});
Of course code isn't always this nice. Sometimes you have code which manipulates an array like this, but then has to perform some work with the result:
for(id obj in array)
[self doSomethingIntensiveWith:obj];
[self doSomethingWith:array];
dispatch_async
in the GCD example means that this doesn't work. And you can't solve it just by switching to dispatch_sync
, because that will cause each individual iteration to block, destroying all parallelism.
One way to solve this problem is by using dispatch groups. A dispatch group is a way to group together multiple blocks, and either wait for them to complete or be notified once they complete. They are created using dispatch_group_create
, and the dispatch_group_async
function allows submitting a block to a dispatch queue and also adding it to the group. We could then rewrite this code to use GCD like so:
dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_group_t group = dispatch_group_create();
for(id obj in array)
dispatch_group_async(group, queue, ^{
[self doSomethingIntensiveWith:obj];
});
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
dispatch_release(group);
[self doSomethingWith:array];
-doSomethingWith:
in the background instead of waiting. To do this, we'll use dispatch_group_async
to set a block to run when the group completes:
dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_group_t group = dispatch_group_create();
for(id obj in array)
dispatch_group_async(group, queue, ^{
[self doSomethingIntensiveWith:obj];
});
dispatch_group_notify(group, queue, ^{
[self doSomethingWith:array];
});
dispatch_release(group);
-doSomethingWith:
needed to run on the main thread, for example to manipulate the GUI, all you need to do is pass the main queue to dispatch_group_notify
instead of a global queue.
For the synchronous case, GCD provides a nice shortcut with the dispatch_apply
function. This function calls a single block multiple times in parallel and waits for it to complete, just like what we wanted:
dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply([array count], queue, ^(size_t index){
[self doSomethingIntensiveWith:[array objectAtIndex:index]];
});
[self doSomethingWith:array];
dispatch_apply
that we can use. But we're using an API built around asynchronous invocation! We can just use dispatch_async
to push the whole thing into the background:
dispatch_queue_t queue = dispatch_get_global_qeueue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_async(queue, ^{
dispatch_apply([array count], queue, ^(size_t index){
[self doSomethingIntensiveWith:[array objectAtIndex:index]];
});
[self doSomethingWith:array];
});
The key to this approach is identifying code which is performing identical work on many different pieces of data at once. If you ensure that the work performed is done in a thread safe manner (beyond the scope of this post) then you can replace your loops with calls to GCD in order to achieve parallelism.
In order to see a performance gain, you need to be performing a fairly substantial amount of work. GCD is lightweight and low-overhead compared to threads, but it's still somewhat costly to submit a block to a queue. The block has to be copied and enqueued, and the appropriate worker thread somehow notified. Submitting a block for every pixel in an image is probably not going to be a win. On the other hand, submitting a block for each image when converting a collection of images is probably going to be a win. The point where GCD ceases to be profitable falls somewhere in the middle. When in doubt, experiment. Parallelizing applications is an optimization, and as such you should always measure before and after to make sure that your changes helped. (And to make sure that you're making the changes in the right place!)
Subsystem Parallelism
The previous section talked about taking advantage of multiple cores in a single subsystem of your application. It can also be useful to do this across multiple subsystems.
For example, imagine an application which opens a document containing metadata. The document data itself must be parsed and converted into model objects for display, as must the metadata. However, the document data and the metadata don't interact. You could create a dispatch queue for each one, then run both in parallel. The code for each piece of data parsing would be entirely serial within itself, and thread safety is not a concern (as long as you don't have shared data between them), but they will still run in parallel.
Once the document is open, the program needs to perform tasks in response to user actions. For example, it may need to perform spell checking, syntax highlighting, word counting, autosave, and other such things. If each one of these tasks is implemented using a separate dispatch queue, they will all run in parallel with respect to each other without many of the difficulties of multithreaded programming.
By using dispatch sources, something I'll cover next week, you can have GCD deliver events directly to a custom dispatch queue. A part of your program that monitors a network socket, for example, could be given its own dispatch queue which will then allow it to run in parallel with respect to the rest of the application. And again, by using a custom queue, this module will run serially with respect to itself, simplifying programming.
Conclusion
This week we saw how to use GCD to increase the performance of your applications and take advantage of modern multi-core systems. Although care must still be taken when writing parallel applications, GCD makes it easier than ever to take advantage of all available computing power.
That wraps up this week's Friday Q&A. Come back next week for the next part in the continuing series on GCD, when I will talk about dispatch sources, GCD's mechanism for monitoring internal and external events. As always, if you have a suggestion for a topic to cover, please post it in the comments or e-mail it directly to me.
Comments:
Going back to the first part, can you elaborate on use of GCD in lieu of locks? A setter which returns immediately and does its work asynchronously seems like a race condition waiting to happen.
I presume I'm missing something -- all threaded programming sort of seems like a race condition waiting to happen -- but I was surprised to see that option presented matter-of-factly.
A thread safe API needs to follow three rules:
1) Synchronize all access to shared data. In other words, always take a lock (or something equivalent, like using a serial dispatch queue) before touching any shared state.
2) Ensure that the visible state of the object is self-consistent at all times. E.g. you have separate firstName and lastName variables, you should always update them both within a single critical region so that a change from "Charlie Chang" to "Mohammed Ali" doesn't result in some poor caller getting a full name of "Mohammed Chang".
3) Ensure that the visible state of the object is temporally consistent from any given thread. In other words, if you set, and then get (and no other thread jumped in the middle) the get should reflect the set.
(I may have forgotten some rules, since this is off the cuff, but consider these to be at least a minimum.)
Using locks around shared data gives you 1 and 3 automatically. #2 you have to work at by making sure that all of your updates are done atomically. (Incidentally this is why the atomic attribute on ObjC @property declarations is such a bad idea; it fools you into thinking that you're safe but does nothing to achieve #2.)
So what do dispatch queues give us?
#1 is taken care of because the dispatch queue acts just like a lock if you ensure that all shared data access occurs on the queue.
#2 is up to you, just like it is with locks. Ensure that your object's internal state is consistent every time you exit a block on your dispatch queue.
#3 is probably what you're worried about. However, the use of a serial queue, and the fact that GCD guarantees that queues execute in FIFO order, ensures that this is always taken care of. Your setter executes asynchronously, this is true. However, it is guaranteed to execute before any subsequent operation. The danger would be in having a getter, enqueued afterwards, run before the setter has a chance. But this can't happen; if the setter hasn't run yet, the getter will be blocked until it has, at which point the getter will see a temporally consistent value.
For example, if you have an NSString setter that makes a copy of the string (to avoid sharing an NSMutableString) then you should make the copy outside of the setter block, so that it happens synchronously with the caller. Once you have the copy, you can then enqueue a block to actually set your instance variable to point to that copy.
Put it another way: using locks means that setters happen synchronously with respect to both the caller and the callee. Using an asynchronous dispatch means that part of the setter happens synchronously only with respect to the caller, and part of it happens synchronously only with respect to the callee. For many things this makes no difference, but for some things you will have to split code between the two regions. This is similar to how the synchronous getter had to call autorelease outside the dispatch block to ensure that it happened on the correct thread.
Think of your code block (in a selector implementation, or a function, or whatever) as a recipe and the scope it runs in is a kitchen. If you were to give this recipe to two blind cooks at the same time, does the recipe give them enough information so they don't accidently set the oven to 900 degrees, or one adds flour to a cake that's already iced, because you store a finished cake and the mixing bowl on the same counter? A "lock" in this instance is like a step in the recipe that says "feel the counter, if it is occupied, wait until it isn't, otherwise proceed." This way, two cooks can work in the kitchen simultaneously and share the counter -- the first will get it first, but the second only has to wait until the cook is done with the counter, and then the first cook will "relinquish" it and take over the oven, meaning the second can get to work with the counter and then wait for the oven to be available.
Also on a more concrete note, there's a bit of reading you should do about the reentrancy of the various Apple APIs, because not all of them are, and putting many Cocoa API calls inside an async block will fail horribly -- mikeash has already gone into this a little. AppKit is basically not reentrant: bits of it are, but anything inheiriting from NSResponder or NSView isn't, and that's most of AppKit. The Foundation API is, except for things like NSArchiver, the mutable collection classes, and host of other things. So keep in mind that if you are trying to write thread-safe code, not only does it have to be internally thread-safe, but anything you call must be thread-safe as well.
http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/Multithreading/Introduction/Introduction.html
What you're describing is actually thread safety. Multiple independent threads of control, executing simultaneously (possibly time-sharing CPUs, but conceptually simultaneously) in the same block of code. If that block of code is thread safe, then operations complete as expected in this circumstance. If it is not thread safe, then the threads interfere with each other and Bad Things Happen. The common way to make a block of code thread safe is to wrap it in a lock, which allows only one thread into unsafe areas at a time.
Reentrancy is very different. Click your click, then in the table of contents click "Appendix A: Thread Safety Summary". Notice how there are separate sections listing thread-safe classes and reentrant classes.
Reentrancy is allowing the same thread of control to call a block of code while that block of code is already being executed farther up the stack.
As an example, take one of the reentrant classes listed on that page: NSNotificationCenter. Imagine that some code posts a notification, which is then received by your code. One stack frame up from your code, NSNotificationCenter is sitting in the middle of some of its code. From within your notification handler, you decide to add a new notification observer, calling back into NSNotificationCenter even though it's still in the middle of executing.
Reentrant means that NSNotificationCenter can handle this situation.
Imagine if NSNotificationCenter took an NSLock at the beginning of every method, and relinquished it at the end. It would be thread safe. But the scenario I outlined above would deadlock. It would not be reentrant.
Another, less common scenario where reentrancy is important is in signal handlers. A signal handler is executed by essentially hijacking a thread in your application and making it run the signal handler. Whatever that thread was doing at the time is just paused for the time being. Lots and lots of thread-safe APIs are completely unsafe to call from a signal handler, because those APIs achieve thread safety through locks, and the thread that was hijacked to run the signal handler may well be holding onto one of those locks, leading to a deadlock.
To put it simply, reentrancy is basically about being thread-safe when all but one of the threads are blocked for the duration, whereas normal thread safety is about being safe when all of your threads are actively executing.
Most of what you say is correct, as long as you think "thread safe" every time you read "reentrant".
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_async(dispatch_get_main_queue(), ^{
dispatch_apply([self imageList].count, queue, ^(size_t index) {
[self downloadImageFromRemotePath:[[_imageList objectAtIndex:index]];
});
[[self tableView] reloadData];
})
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.