mikeash.com: just this guy, you know?

Posted at 2009-02-13 20:31 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2009-02-20: The Good and Bad of Distributed Objects
Previous article: Late Night Cocoa: NSOperationQueue Problems
Tags: fridayqna nsoperationqueue performance
Friday Q&A 2009-02-13: Operations-Based Parallelization
by Mike Ash  

Welcome back to Friday Q&A, which this week is also Friday the Thirteenth! Be especially careful, as this is the first of two consecutive Friday the Thirteenths. For this first Friday the Thirteenth I'm going to talk about parallel software design using an "operations" approach (think NSOperation), as suggested by Nikita Zhuk way back when I first started this whole thing.

What It's For
Parallel processing is becoming more and more important. Right now, good parallelization can result in perhaps a factor of eight speedup to your application compared to a non-parallel implementation, on high-end hardware and with the right sort of computation. More realistically you'll probably see less than that, but even a 50% speedup is nothing to sneeze at, and it's not that hard to get significantly more on a Mac Pro. The future looks to be just more and more cores too, so today's 50% speedup on a dual-core machine might become much more on tomorrow's 32-core monster.

Splitting a program into discretely-executable chunks, or operations, is a good way to parallelize it. Apple introduced this idea on Mac OS X with NSOperationQueue, which provides a nice API for it. Unfortunately, they totally screwed up the implementation so you might not want to use it now, but the concept is still very useful, whether you're waiting for Apple to fix NSOperationQueue, using a similar API like RAOperationQueue or are just doing multithreading the old fashioned way.

Design Approaches
First, think about your program and what it does. A single-threaded program is a serialized list of actions, performed in order. Sometimes they're performed in order because the second action depends on the results of the first. But sometimes it's simply because the program has to do two things, and if you're single-threaded then you need to pick one to do first.

Try to find those unrelated actions where order doesn't matter. Loops are often a good candidate for this sort of thing. Imagine something like this:

    for(NSImage *image in images) {
        [self rotate:image];
        [self scale:image];
        [self save:image];
There are no dependencies between those images, so each one can go into a separate operation. On the other hand, each individual method call within the loop depends on the previous one, so those can't be split out beneficially.

It can go beyond loops, too. Imagine some code like this:

    [self loadImage];
    [self parseDictionary];
    [self setupNetworkListener];
    [self getDirectoryListing];
If these methods have no interdependencies, as their names imply, then they too can be broken out into separate operations for parallel execution.

One question to ask here is whether you should break them out. Is it a win?

The answer to that question will depend greatly on how long it takes to complete the operation. There's always going to be extra overhead for a parallel operation, and you don't want that overhead to overwhelm the gains you get. As a general guideline (very general, overhead will vary a lot depending on what you're using to do parallel processing), figure that if your operation finishes in less than a millisecond it's probably not worth it. As always when it comes to performance optimization, measure before and after to see if you achieved a speedup, and profile to make sure you're hitting the parts that will do the most good.

It will also depend on exactly what your operations are doing. For example, if you have one I/O-bound operation (reading a bunch of stuff from a disk) and a bunch of CPU-bound operations, you'll benefit from having that I/O-bound operation sit in the background and let the CPU-bound operations get on with their tasks. You'll also benefit from running the CPU-bound operations simultaneously to benefit from multi-core machines. On the other hand, if all you have is a bunch of I/O bound operations, you're likely to make your performance worse if you try to run many of them in parallel, due to disk contention.

The key to this style of parallel programming is functional programming. By that I mean programming in the style of mathematical functions. The key attrtibute of mathematical functions is that they always give the same result if you give them the same argument. For example, if f(3) = 5 then f(3) will always give you 5. This means essentially, that there are no side effects. Side effects are death for parallel programming. Side effects means dealing with external data, which means shared data, which means synchronization, locks, and all pain that brings. It's much better if you can write your operations in terms of computing a function based on some arguments but nothing else.

Wrapping Up
Parallel programming is challenging and rewarding, and has many complicating factors, but I hope this gives you some ideas to start with. While I don't recommend using NSOperationQueue in any shipping product, it can be a good learning tool, and if you're starting on something new that can require Snow Leopard it's probably a good bet. Even if you can't use it, doing your own custom multithreading in such a way as to deal with discrete operations like this can be a big help in avoiding the locking and concurrency hell that plagues so many multithreaded programs.

That wraps it up. Come back next week for the exciting conclusion, including what happened to the horse, where they hid the loot, and why are there so many pigeons in there anyway?

Leave your comments on this post below. Hate multithreading? Have your own battle story to share? Post it below. Also post any suggestions you might have for future editions of Friday Q&A, or e-mail them directly (tell me if you don't want your name to be used). Remember, Friday Q&A is fueled by your ideas, so send them in!

Did you enjoy this article? I'm selling a whole book full of them. It's available for iBooks and Kindle, plus a direct download in PDF and ePub format. It's also available in paper for the old-fashioned. Click here for more information.


mvo at 2009-02-15 01:15:36:

    for(NSImage *image in images) {
        [self rotate:image];
        [self scale:image];
        [self save:image];
There are no dependencies between those images, so each one can go into a separate operation. On the other hand, each individual method call within the loop depends on the previous one, so those can't be split out beneficially.

Just raising some thoughts on this article, then actual answering them...

Scaling one image can be split up over several threads, I don't have any heuristic data/experimental code that could proof if it would be more beneficially. Then your threading strategy, but lets say for example you have 4000Wx4000H image and you split it up in 4 threads on a 4 core system. 4000Wx100oH image sections. You would see some benefit. Also the saving is an IO bound operation, like you mention so you could benefit from splitting that up in separate thread/operation. Because while that thread is "sleeping" for IO stuff to finish. Another thread can do some other scaling...

Also I'm wondering if it is a good design to have a multiple threads that save to disk? Lets say you have 4 images in your array and they all start saving at the same time. What if you would design a threaded system with only one threaded dedicated to saving, means maybe less disk movement.

Also having 4 threads to load each an image meaning you would use more memory then having one image loaded at the same time but have 4 threads work on the scaling. More memory usage could invoke VM etc...

Rotation is a different story because if you flip an image say like 90% you loose the benefit of working on same rows. Also the Rotation needs to finish completely before the scaling can start.

just some thoughts that come up..


mikeash at 2009-02-15 03:41:06:
Good points, let me address them one by one.

Image scaling is certainly something that could be multithreaded. But this is a kind of advanced technique that I think goes beyond the level I was looking at in this article. "Scale an image" is conceptually a single unit, usually performed by the frameworks, and breaking it up and manually running it in parallel is a lot of work. If you do it a lot then it could pay off a bunch, of course, but it's not just a matter of designing your code in terms of discrete operations that can run in parallel.

And yes, saving could benefit from being another operation here exactly as you say, and having multiple threads saving simultaneously is probably a bad idea due to disk contention.

Putting the entire contents of that loop in a single operation and then running multiple such operations in parallel is still going to be helpful. Assuming you have a lot of images, you'll gain scaling performance just from running more than one in parallel. Saving is not going to cause a lot of problems because the threads will spend most of their time doing things other than saving, so that should be fine. But for optimal performance, you would probably want to pass saving off to a separate module which saves them serially. With NSOperationQueue, you could do this by having a separate queue with setMaxConcurrentOperationCount:1, then enqueue save operations onto it when the other stuff is finished.

Your point about memory usage is also extremely smart. If each operation is very memory-intensive then it's easy to blow out your memory and start swapping, utterly destroying any speed gains you might have. You can see this with Xcode on a memory-starved Mac Pro. Tell Xcode to run multiple instances of gcc (each of which takes several hundred MB of memory) and you can easily get it to run a build much more slowly than if you tell it to just run one. Of course if you have memory to spare, this reverses, and it becomes hugely beneficial to run multiple instances of gcc.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Web site:
Formatting: <i> <b> <blockquote> <code>. URLs are automatically hyperlinked.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.