mikeash.com: just this guy, you know?

Posted at 2008-12-20 01:28 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2008-12-26
Previous article: Friday Q&A
Tags: fridayqna parallelism performance threading
Friday Q&A 2008-12-19
by Mike Ash  

Great response last week. This week I'm going to merge Sam McDonald's question about how I got into doing multithreaded programming and Phil Holland's idea of talking about the different sorts of parallelism available.

Like a lot of computer programmers, I was always interested in making code run fast. This led to better languages (I started in BASIC!), micro-optimization, and algorithms, but ultimate performance means multiprocessing. The distributed.net and SETI@Home projects showed the power of distributed computation.

Multithreading was also interesting in coming to OS X from the old Mac OS, where multithreading was a lot more limited and difficult. At the time it wasn't about performance, since most machines had only one CPU. But multithreading has lots of other benefits for organization, design, and interactive GUIs so it was still highly useful.

Then the ongoing multicore revolution kicked off and made it clear that multithreading was the way to go.

That's the why. The how is pretty boring. Just lots of work, reading, and experimentation on all sorts of multiprocessing, not just threading. They're very different, but many concepts are the same, and ideas from one can often help with the others. As with most things, practice and experience makes a big difference.

So then we have the different forms of parallel processing available. There are actually a lot of these, and I'm probably doomed to miss some, but:

  1. Distributed computing. Probably the most visible example of this one for Mac developers is distributed builds in Xcode. This is generally the most difficult to build, the most expensive to take advantage of for the user, and therefore the least useful. Bandwidth and latency between computers are horrendous compared to what you get within a single machine, so it's hard to write something that goes fast. Xcode can get away with it because it (usually) does a lot of processing for each bit of data that it processes. Beyond the difficulty of achieving speed, you also have to deal with a much more error-prone environment. You want to recover gracefully if the user wanders out of wifi range, not lose a bunch of data. Most of the time this is not worth it, especially if you're going to be shipping consumer-level software.
  2. GPGPU. Basically using the video card for computation. This is what GPULife does. It's capable of immense power. A top-of-the-line video card can easily outperform a top-of-the-line CPU by a factor of 50 with the right program. It's also really hard. GPUs are extremely parallelized and have a considerably different architecture from CPUs, so coding for them is hard and making them go fast is harder. (Although even slow GPU code can run really fast due to the amount of power available.) Technologies like CUDA and OpenCL promise to make this sort of thing a lot better, although you're always going to be dealing with the fact that it's a massively parallel system with really different performance characteristics. My recommendation here is to wait for Snow Leopard and then hope OpenCL delivers on its promise.
  3. Multiple processes. Again Xcode is a prominent example of this approach, where you can see it spawn multiple instances of gcc when compiling. This is often talked about as being an easier, safer way to go than multithreading because the OS protects processes from each other and forces a better separation of concerns. I don't buy it, personally. For just about any non-trivial program, a dead subprocess is going to mean that the whole thing comes crashing down, and all you've done by splitting it into multiple processes is make it harder to debug. What's worse, OS limits on the number of processes tend to be frighteningly low, so your program would need to gracefully handle being unable to spawn as many subprocesses as it likes. (And all the other apps on the system would need to as well!)
  4. Multithreading. The standard technique. Often very difficult to get right, and very difficult to debug, but potentially very rewarding in terms of performance. Threads can also help to better organize a program. It's often much cleaner to put long-running processing or blocking operations into a separate thread than to try to multiplex them together.

Multithreading is the one we're most familiar with and the one that's the most generally useful. It's useful because it's very generalized, so you have various ways to use multithreading to actually get things done:

  1. Locks. "Standard" multithreading. You have shared data protected by locks. Acquire the locks before you fiddle with the data. Often used to build more sophisticated machinery. This level can be tricky to get right so I recommend avoiding it where you can, and using it sparingly to build better abstractions where you must.
  2. Message passing. With message passing, you avoid shared data, and have threads communicate using message queues instead. (The message queues generally have shared data inside them, but that's an implementation detail.) Cocoa has some nice facilities for this with the -performSelectorOnMainThread:... and performSelector:onThread:... calls. The threading-heavy language Erlang uses this model extensively and is the main force behind its multithreading power.
  3. Operation units. This is kind of like message passing, except the operations just fly off and get executed on a queue which uses threads outside your view. When set to only execute one operation at a time, a queue can act like a synchronization point, replacing locks in a way that's often easier to work with. NSOperationQueue provides this (but it's broken, don't use it, use RAOperationQueue instead) and Grand Central Dispatch in Snow Leopard is rumored to provide similar facilities.
  4. Atomic/transactional objects. Rather than using mutual exclusion (locks, queues) to avoid destroying shared data, you can build objects that operate using transactions. Grab a snapshot, make changes, commit them. (Often this is implemented as a loop: snapshot, change, try to commit and start over with a new snapshot if something changed in the middle.) TransactionKit is a great example of this sort of thing in a Cocoa context.

As for what to use, here are my thoughts. Avoid distributed computing unless your code is going to be run by a single client with a lot of available hardware. Being able to snarf up CPU cycles from idle hardware sitting around in the user's house sounds cool but just doesn't pay off most of the time. Avoid GPGPU on the Mac until Snow Leopard ships unless you have a really good application for it. OpenCL will make GPGPU a lot more practical and flexible, so trying to shoehorn your computationally expensive code into GLSL or CoreImage today just doesn't seem worth it.

Using multiple processes is a good idea if the subprograms are already written. If you're invoking gcc as a subprocess, invoking it simultaneously on four files instead of one by one is pretty easy. If you're writing your code from scratch, I don't recommend it unless you have another good reason to write subprocesses, as it's difficult and the reward just isn't there.

For multithreading, concentrate on message passing and operations. Multithreading is never easy, but these help greatly to make it simpler and less error prone. Good OO design will also help a lot here. It's vastly easier to multithread an app which has already been decomposed into simple objects with well-defined interfaces and loose coupling between them.

That wraps up this Friday Q&A! Please feel free to discuss everything below, and send in your suggestions for next week. As I mentioned before, just try to keep things on-topic, and tell me if you don't want me to use your name. Post suggestions in the comments or e-mail them directly.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.


I still haven't had a project that has required multithreading, but I am currently working on some code that could benefit from it a lot.
Thank you very much for your discussion on the different types of multithreading. Anytime I just jumped into Apple's docs I have seen the words lock and operation queue, without really understanding what I was going for. Having a quick introduction, and information on where to start is very helpful.

I will be interested to see what you post on next week.
Nice overview, thanks!

In the distributed computing approach, it might be worth mentioning Apple's Xgrid. It is well-integrated into OS X, the meat of it being in the server side. But any OS X client can contribute CPU. An example of this is the OpenMacGrid that provides access to an Xgrid cluster for scientists: http://macresearch.org/OpenMacGrid (this qualifies as a shameless plug, since I am part of MacResearch, but just to make it clear: this is a non-profit thing).

I'm glad you guys enjoyed it. Xgrid is definitely worthy of mention. For the sort of scientific projects that you're involved in, distributed computing makes a whole lot of sense. Where I think it doesn't make much sense is in a typical consumer application, but if you're simulating protein folding or using other extremely intensive large-scale applications, obviously it's great.

As for next week, keep those suggestions coming! I have a bit of a reserve now but I'd rather keep it big than run out.
You obviously know your stuff, and a lot of this is still a bit beyond me. Have you considered (1) putting your knowledge into a book, (2) ??? and (3) profiting? Just a thought. (I'd buy it.)
Hi Mike. Good stuff. I'd be curious to know what you think of the new ActorKit as a way to simplify message passing:

ActorKit looks very interesting and in my opinion the actor model is a very promising way to make multithreading practical. (Promising to me. I realize that it's used extensively in e.g. Erlang and does well there.) However I haven't had a chance to actually use it so I can't say much more. One concern is that without lightweight threads, the utility of the actor model is somewhat limited by the fact that the OS will only let you spawn a few thousand threads in any given process.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

The Answer to the Ultimate Question of Life, the Universe, and Everything?
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Hosted at DigitalOcean.