mikeash.com pyblog/friday-qa-2009-08-28-intro-to-grand-central-dispatch-part-i-basics-and-dispatch-queues.html comments

valli - 2013-07-11 06:48:04

Thu, 11 Jul 2013 06:48:04 GMT

Great article

mikeash - 2009-10-18 21:47:56

Sun, 18 Oct 2009 21:47:56 GMT

Well, the very first time I mention the concept of global queues, I say, "Global queues are concurrent queues shared through the entire process," and at the first mention of custom queues I say, "These are serial queues which only execute one job at a time."

dete - 2009-10-18 21:24:45

Sun, 18 Oct 2009 21:24:45 GMT

Seems worth pointing out that the named High/Normal/Low Priority global queues are NOT serial, and can't be used for synchronization as in your last example. Only application created named queues and the main queue can be so used.

It's also possible that a future version of GCD may add the ability for applications to create their own named concurrent dispatch queues.

Your example confused the heck out of me at first, because I assumed that all queues were concurrent (or else, what's the point!), and had to go to the docs to clear that up!

mikeash - 2009-09-04 01:04:08

Fri, 04 Sep 2009 01:04:08 GMT

Note that my ObjC blocks code was meant to illustrate how you could somewhat copy Erlang's style for coping with the deadlock problem, rather than to illustrate how Erlang itself works. Sorry if that was unclear. In any case I appreciate the additional information on all of it.

jamiehardt - 2009-09-04 00:40:09

Fri, 04 Sep 2009 00:40:09 GMT

s/program arguments/function arguments/

jamiehardt - 2009-09-04 00:37:51

Fri, 04 Sep 2009 00:37:51 GMT

Sortof. The "callback block" would be one of the pattern conditions inside another process's "receive" block, which works like a big "socket received data" event handler with a case statement for exactly what came down the wire.

Your "careful" note wouldn't be necessary, because messsage sends can't change the environment they originate from. It's simply impossible for a block of Erlang code to "request" data from another process in a blocking manner. All it can do is send bytes down the pipeline, and do X when it receives bytes into its mailbox. In Erlang programs, though I warn you I'm pretty inexperienced in this as well, the pattern of code is more likely to start with a "input" process (or processes -- because they share nothing any one process can be made into hundreds to improve multiprocessing), which hands execution of to various workers, and then off to various presenters. You don't see the same "russian-doll" style calling like in procedural language, where a runloop frames a program event and and the case of the event jumps into a function, which jumps into another, on and on and then returning out and out and out back to original scope before moving on. Interactions between processes in Erlang, besides the program arguments, are essentally stateless.

mikeash - 2009-09-03 21:10:29

Thu, 03 Sep 2009 21:10:29 GMT

Interesting stuff. Thanks for the explanation.

If I've understood it correctly, a rough (I said "rough"!) equivalent of the Erlang convention would be to write "getters" that take a block as a parameter and invoke it asynchronously when it's done, kind of like this:

- (void)getProcessData: (dispatch_queue_t)callbackQueue: (void (^)(NSData *))callbackBlock  {

    dispatch_async(myQueue, ^{

        NSData *localData = [self processData]; // synchronous, non-thread-aware getter

        dispatch_async(callbackQueue, callbackBlock);

    });

}

Then call it like so:

...code...

[otherObject getProcessData: myQueue : ^(NSData *processData){

    ...use processData...

    ...remainder of method here...

}];

...careful, code here runs before the processData code...

I don't think that this is practical for everyday accessors, but could be an interesting pattern to follow for certain cases.

jamiehardt - 2009-09-03 20:33:48

Thu, 03 Sep 2009 20:33:48 GMT

I think in the Erlang case that Joachim brings up, you can't really phrase a synchronous getter in Erlang the way you would in C because "message sends" are always asynchrous. Erlang has no way of saying "a = [otherObject processData]" in such a way that the runtime will wait on the assignment.

In Erlang, in which we would say this something like "A = OtherObject ! {processData}, A will be immediately assigned to "processData" (which is just an interned string, like a ruby Symbol). The calling object is supposed to obtain the result of the processData message by waiting for a response from OtherObject itself, which may come at any time, in any order with other calls -- the waiting itself is also concurrent with any other work the Process may be undertaking. Erlang has primitives for setting up send/receive patterns like this, and setting timeouts for when a Process doesn't get a response.

Joachim Bengtsson - 2009-09-03 11:12:09

Thu, 03 Sep 2009 11:12:09 GMT

@mikeash: As far as I understand it, it's "don't do it", yes. I still haven't managed to wrap my brain around Erlang enough to write good apps in it, but I think the recommendation is to just make sure you pass all the relevant state as arguments, and/or keep the state in two places (which I just can't agree with).

mikeash - 2009-08-30 22:14:53

Sun, 30 Aug 2009 22:14:53 GMT

The fact that GCD is so much more efficient for background-and-forget than for bacgkround-and-main-thread should not be surprising, and is exactly what I expected to see. The reason for this is because, in a Cocoa application (or anything using CFRunLoop), the main thread is not under the control of GCD. There is obviously some mechanism which allows CFRunLoop to cooperate with GCD in order to make the main loop function, but ultimately you're still going through CFRunLoop. This means that the backgrounding part is all done in efficient GCD-land, but coming back to the main thread ultimately mirrors the other, slower techniques even though you're only using the GCD API.

As for latency/throughput, I submit that throughput is still the more important variable. You would never background a single-threaded task that needed to update the GUI in a swift manner, because there's no advantage over simply performing that task directly in the main thread. The reason to use this pattern is to allow the main thread to do other work in the mean time, which means that it's throughput that counts. As an example, imagine that you have a task which takes 50us to execute directly, 75us to roundtrip through GCD, and GCD can process through a group of these tasks at an average of 25us apiece. It makes sense to shove this stuff through GCD in this case, even if the roundtrip time is worse.

The one case where latency would really matter is if you have an extremely common task which normally takes an extremely short amount of time but occasionally takes a very long time, such that you want to background it so it doesn't lock up the GUI. In this case, pushing all the short ones onto GCD could lead to a significant slowdown. However, for something like this, you could play it highly conservative and only background it if you anticipate it taking more than, say, 50ms. Of course, with your experimental results we know that 50ms is a highly conservative number, so that's certainly useful.

John McLaughlin - 2009-08-30 06:52:56

Sun, 30 Aug 2009 06:52:56 GMT

@mikeash

Your right about the gcd example, I had the order of execution reversed (first the main thread and then the background thread) That was a mistake and not suprisingly fixing it didn't change the result (150us)

I updated the post to show the costs to do three cases (see table in the middle)

1) The case I examined (push work off on background and then have it update in main thread)
2) the time to just launch case #1 above and return
3) The time to push work off on the background and then never call back the main thread with the result.

What surprised me was case #3 while being 1/2 the time of case #1 for the performSelector/NSThread approach was significantly more effecient for the blocks/gcd case (a factor of 7 times faster, not the expected 2)

Finally on latency/throughput -- I was trying to answer a very specific question -- how much 'work' would you have to incur before considering pushing the work off to a background task.

My use case was designed around the idea of being called from the event loop and needing to eventually update the UI on the main thread.

Anyway thanks again for the good post.

mikeash - 2009-08-30 03:03:38

Sun, 30 Aug 2009 03:03:38 GMT

Joachim: Do you know how Erlang solves that problem? Is it just a matter of "don't do it"?

Jim: Try this URL:

http://www.mikeash.com/pyblog/rss.py?mode=fulltext

I made it after a similar request but the fellow never told me if it worked well for him so I never got around to making it public.

Jim - 2009-08-29 21:55:02

Sat, 29 Aug 2009 21:55:02 GMT

Mike, any chance you could turn off the truncation for RSS feeds? mikeash.com is blocked at my place of employment.

Joachim Bengtsson - 2009-08-29 17:53:31

Sat, 29 Aug 2009 17:53:31 GMT

I was thinking the same thing as Paul, as it's reminiscent of a similar problem in Erlang. Erlang uses independent asynchronous processes about the same way as one might use objects in an OO language; this would be similar to using a single queue for handling all methods of an ObjC object. Say process/object/queue A asks proc/obj/q B to do something and return an answer (that is, synchronously), but B needs to query A for something (could be as simple as calling a getter, which must be a synchronous call), we deadlock! A is waiting for B to finish, B asks A something but A is busy waiting for B.

Your examples get me really excited! GCD seems so simple yet it's so powerful! Thanks for another great Q&A.

mikeash - 2009-08-29 12:31:29

Sat, 29 Aug 2009 12:31:29 GMT

Your GCD test is kind of screwed up. Your other tests are "run some stuff in the background and then finish up on the main thread". Your GCD test is, "run some stuff on the background, and then push back to the main thread a block which finishes up on the background". It's redundant and doing more work than the others.

You're also testing two separate mechanisms simultaneously: backgrounding a task, and then bringing it back to the main thread. This is a common pattern and thus a useful measurement, but it would also be interesting to see what it costs to only do the backgrounding.

Finally, you're measuring latency but not throughput, which is commonly going to be more important for real-world tasks.

John McLaughlin - 2009-08-29 07:36:46

Sat, 29 Aug 2009 07:36:46 GMT

I spent a little time tonight and wrote up my results

http://loghound.com/about/blog2/index.php?id=7996637473678888874

Basically in my simple experiment it took thousands (~2500) method calls to equal the overhead of any sort of backgrounding approach.

GCD appeared to be slightly faster (150us vs 200us) but they were all pretty long compared to just doing the work directly.

Paul - 2009-08-29 00:51:37

Sat, 29 Aug 2009 00:51:37 GMT

Indeed, using _sync() everywhere can get you into trouble; I was just commenting for anyone who sees dispatch_sync() as a tempting replacement for @synchronized(), which will inevitably lead to pain, even if you always use dispatch_sync() for getters and dispatch_async() for setters.

I definitely agree: anything in a dispatch block should be as small as possible, or else you'll probably wind up in trouble pretty quick.

mikeash - 2009-08-28 23:15:15

Fri, 28 Aug 2009 23:15:15 GMT

John McLaughlin: I have this vague memory from WWDC of the presenter saying that starting a new pthread was only a win when the work to be done was equivalent to the overhead at least 10,000 C function calls, and that the breakeven point with a GCD block was only 100. However I can't even remember which WWDC this was, or how current this is, or even if the numbers are right.

Paul: This is a fine point. For this reason you should use the _sync variant as little as possible, and do as little as possible within the block when you do use it.

Paul - 2009-08-28 23:04:05

Fri, 28 Aug 2009 23:04:05 GMT

However, it's worth noting that it's also very easy to hang yourself when using dispatch queues as locks. As an example:

dispatch_queue_t dq1 = dispatch_queue_create("com.example.q1", NULL);
dispatch_queue_t dq2 = dispatch_queue_create("com.example.q2", NULL);

dispatch_sync(q1, ^{
    dispatch_sync(q2, ^{
        dispatch_sync(q1, ^{
            printf("Help, I've deadlocked an I can't get up!\n");
        });
    });
});

dispatch_release(q2);
dispatch_release(q1);

Obviously this structure is contrived, but it's extraordinarily easy to create if you start using queues around setters/accessors for interconnected objects. And just to make people aware, trying to check dispatch_get_current_queue() is not safe against this problem. If you take out q2, it is, but that's a dangerous assumption.

John McLaughlin - 2009-08-28 19:45:58

Fri, 28 Aug 2009 19:45:58 GMT

Hey Mike,

Thanks for your (as usual) well written post -- I've come to really enjoy them.

This one comes in particularly handy as I decided to convert something I'm working on to SL last night and started digging through my notes & WWDC videos to refresh myself on how GCD worked -- While the information is there to be had this is a nice concise summary.

One thing that I've never seen data on is the actual overhead of GCD -- at WWDC is was listed as 'small' but what is the break even point? Is there a figure of merit for how much work you need to perform before throwing it on a background queue is a win?

-John

bbum - 2009-08-28 19:26:11

Fri, 28 Aug 2009 19:26:11 GMT

The issue of @synchronize() is not one of speed vs. locks, but one of having to set up exception handling such that the lock can be unlocked in the face of an exception.

With GCD & queues, the behavior with an exception is undefined since the execution of the block could happen on any thread.

So, you'd still need an exception handler if you wanted to generically replace @synchronize()'s lock with a queue and that is the main source of the performance hit.

BTW: If anyone reading this has ideas for improving & extending Blocks, GCD, or anything else, please file a bug via http://bugreport.apple.com/

mikeash - 2009-08-28 14:59:15

Fri, 28 Aug 2009 14:59:15 GMT

I'm sure that @synthesized atomic accessors still use normal locks to control access. There would be no reason to use a queue instead. The benefit from using a queue instead of a lock is not speed, but rather the ability to write better code. Since you're not writing the code either way, there's little point.

Note that despite my example, accessors are still almost always the wrong place to enable thread safety. They are far too granular in most cases for this to be a good place to do that. It really is beyond me why Apple made properties atomic by default, given this. You would generally protect an entire subsystem's interface with a lock or queue, and then have free access to all of that subsystem's objects' properties without locking or other synchronization within that.

Joel Bernstein - 2009-08-28 14:52:34

Fri, 28 Aug 2009 14:52:34 GMT

Do we get queued accessor methods automatically when we @synthesize atomic properties, or do we need to write them by hand?

If it's the former, that would be a pretty nice automatic perceptual speedup just for recompiling in Snow Leopard.

If the latter, well I guess we'll see a new version of Accessorizer pretty soon :)

mikeash - 2009-08-28 14:43:55

Fri, 28 Aug 2009 14:43:55 GMT

I don't know just how much overhead it is. I'm sure Chris knows what he's talking about. You still have the overhead of creating and dealing with ObjC objects, though, even if it's fairly directly implemented in terms of GCD. Also things that NSOperationQueue offers that GCD doesn't, like dependency tracking, can't come for free. That said, I would assume that if you have existing code that already uses NSOperationQueue and works well, there is very little reason to change it.

Chris Parker - 2009-08-28 14:41:56

Fri, 28 Aug 2009 14:41:56 GMT

You're correct, Jeff. NSOperationQueue on SL uses GCD as an implementation detail, and the overhead is pretty small.

Jeff LaMarche - 2009-08-28 14:26:33

Fri, 28 Aug 2009 14:26:33 GMT

Mike -

Great article, thank you. You say in your article that GCD is lower-level and higher-performance than NSOperationQueue. I was under the impression that NSOperationQueue on Snow Leopard actually uses GCD and the overhead it adds on top of GCD is relatively trivial.

Is my understanding incorrect? Is there a significant performance benefit to using GCD directly in Cocoa applications rather than using NSOperationQueue?

Thanke!
Jeff