mikeash.com: just this guy, you know?

Posted at 2007-11-21 04:44 | RSS feed (Full text feed) | Blog Index
Next article: Algorithmic Optimizations: a Case Study
Previous article: IOCCC 2006 Winners
Tags: cocoa garbagecollection performance
Perform Better With Garbage Collection
by Mike Ash  

The move to garbage collection in Cocoa has been interesting. People have said that it's impossible, or impractical, or a bad idea, or doomed to failure, and one of the most common things trotted out is that GC is inevitably slow. However, I think that enabling garbage collection in your Cocoa app could actually be a good way to increase performance under the right conditions.

First, the slowness: speed has been one of the big criticisms against garbage collection since the very beginning. A collector must obviously do more work than manual memory management, the logic goes, therefore it must be slower than doing the work yourself. This ignores the extra overhead that manual management imposes, but it was often true with early collectors. Modern collectors are usually very fast, and by eliminating the overhead of manual management (such as the reference counting and autorelease pools used so pervasively in Cocoa), can outperform it. Of course the manual code can be optimized, so it often comes down to getting decent performance for free or putting a lot of effort into great performance, just like so many things in programming.

The Cocoa garbage collector is claimed to be very fast. It certainly should be; although the fact that it has to live within a C world puts certain constraints on what it can do, such as not being able to compact the heap, it is shiny new and should be able to take advantage of many recent techniques. Apple has also wisely limited the scope of its collector, such that it only collects Objective-C objects and memory you explicitly ask it to collect; it does not try to be a general replacement for malloc/free. The impression I get from reading over Apple's literature about the collector is that it performs very well, although I haven't performed any tests.

But beyond basic speed, I think that the Cocoa collector has one big performance advantage that hasn't been discussed much: it is multithreaded.

The fact that it's multithreaded is discussed in Apple's literature, but mainly just to assure the programmer that the collector won't cause unsightly pauses while it works. Another common problem with older collectors was that they would halt the entire program while collecting, so that human-facing programs would be seen to lock up for visible periods when memory limits were reached, animation would stutter, etc. Apple's collector pauses individual threads in the program, but only for a very short period of time, and the major work of the collector is done with all of the app's threads running freely.

So what does this mean? Quite simple! The bulk of the garbage collector's work can run on a separate CPU core from the rest of your app.

In other words, by enabling garbage collection in your single-threaded Cocoa app, you've suddenly turned it into a multithreaded Cocoa app. Your code can run happily on one core while the collector churns away happily on another core. Since all shipping Macs have been multicore for quite a while now, this is very useful. In situations where your program is the only CPU eater on the system, the garbage collector is essentially free, and you get a performance boost by moving all the memory management work to a separate CPU core.

How much faster is it? I haven't worked up any benchmarks to measure, so right now this is entirely theoretical. But I was prompted to write this post after noticing that one of my Cocoa projects was using 200% CPU under garbage collection despite being completely single-threaded. I looked closer and, sure enough, the app's main thread was running full blast on one core while the garbage collector was running full blast picking up the pieces on the other. Most apps will probably not get a pure 50/50 split like this, but it seems clear that a lot of apps will gain at least some benefit.

Multithreaded programming is becoming more and more important as multicore machines have now become commonplace, and the average number of cores per machine seems set to increase without limit for the forseeable future. OS X is doing a good job of providing both explicit multithreaded services such as NSOperation as well as implicit services like CoreAnimation, which does all of its work in a separate thread but doesn't make the API user think about that fact. It seems that Cocoa garbage collection is another easy-to-overlook example of the implicit multiprocessing support that's being added to OS X, and it's something to think about when considering whether to GC-enable your new Cocoa project.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

You're making an awful lot of assumptions about the actual performance of the GC here. Can we expect a followup post on the actual performance?

Generally speaking, GC can be faster than manual garbage collection. But it often isn't.
Generally speaking, GC can be faster than manual garbage collection. But it often isn't.


Isn't this true of almost everything :)
The other rule of thumb is to measure, not assume. Disable GC on that 200% blaster of yours, and give some real statistics.
Some good comments here.

Chris, you're correct that I am making some assumptions, but perhaps not as many as it looks. In particular, I'm not claiming that GC uses fewer CPU cycles than manual management. I'm only saying that, in some situations, GC will lead to a lower wall-clock time due to spinning some work onto a separate CPU core. The work that happens on the separate core may very well be significantly greater than the work that happens on the main thread under manual management, but it can still be faster overall.

Steve, unfortunately if I disabled GC, the thing would leak like crazy. I'm not writing any dual-mode code right now. However I did profile it and it appears that it's spending a lot of time in contention between the allocator on the main thread and the collector on the GC thread, so much of that CPU usage is probably wasted. This particular case is extremely heavy on object allocation, pretty much just sitting in a tight loop and pumping out temporary objects, so it would appear in that case that GC is not so helpful. I haven't yet tried to come up with a case that more realistically exercises the collector in a way I can properly measure both with and without GC. I did simplify the tight loop down to a test case and it performed several times slower under GC due to all the contention, so that is obviously not a case where GC makes you go faster.
Disabling GC did boost performance of my app in times (complex network app with sophisticated GUI and huge number of short living objects). I had big hopes on GC, all have failed - memory leaks did not disappear (t seems cocoa frameworks have very poor GC support), CPU load increased twice (profiling showed GC was a serious CPU eater and there were lots of inter-thread syncs).

I had to disable GC. This tech is not mature and not very good for complex real-life apps in 10.5 Leopard. I did not yet tried to enable it in Snow Leopard, hopefully implementation is better.

GC is a great tech in theory but poor in current implementation.
I finally "ported" my code to manual mode and I got virtually leak-free App (GC application was leaking 200MB/hour, now it is 100-200 kB/hour). I talk about Leopard.

Finding memory leaks in manual mode proved to be much easier than in GC mode. In GC mode Instruments outputs complete nonsense, most of leaks were in Cocoa internals and sometimes call stack contains no single application call (only cocoa and lower levels calls). In manual mode I had two days to get a reliable leak-free application

My App do issue some 20,000 NSURLConnection per hour and some 40,000 NSXMLDocument per hour. I have a strong feeling that Cocoa frameworks were not tested thoroughly in GC mode- many leaks were in NSURConnection thread, it was really disastrous experience.

GC app showed spinning rainbow periodically for 1-2 seconds, now it is gone, CPU usage is much lower. Thanks God I fixed the leaks problem and got responsive interface. I will try to repeat my GC experience in Snow Leopard only.
Would also love to see a followup on this with some benchmark results.
I also have 2 cents: recently i have came upon a memory issue inside Apple's Security framework under GC, so i second the feeling that the frameworks are not toughly tested for GC, however this particular issue is fixed in the 10.7 beta so it could be that Apple did work in this area for 10.7

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Hosted at DigitalOcean.