mikeash.com: just this guy, you know?

Posted at 2006-10-19 00:00 | RSS feed (Full text feed) | Blog Index
Next article: More Fun With Autorelease
Previous article: Hacking C++ From C
Tags: audio coreaudio rant
Why CoreAudio is Hard
by Mike Ash  

I wrote this post over in comp.sys.mac.programmer.help, then realized that it would make a pretty decent blog post as well. I have edited it slightly to work as a blog post rather than as a newsgroup post. If you've ever wondered why CoreAudio is so difficult to use and why it can't be simple and easy like CoreImage or CoreVideo, read on.

Playing fullscreen video used to be a really big deal but it just isn't very impressive anymore. The actual video data for 1080p video at 30fps is only about 240MB/sec. With current memory bandwidths being measured in GB/sec, this is still close enough to the ceiling to be interesting, but not enough to be a really hard problem. The main challenge in playing 1080p video is not getting the pixels onto the screen, but decoding the pixels from the incredibly sophisticated and complex compression format.

If you look at video, you generally have a very long time in which to generate a frame. 24fps is cinema quality, which gives you over 40ms per frame to draw it. It's generally accepted that higher framerates are better up to a point, but even 60fps (the normal limit for LCDs) still gives you about 17ms per frame.

CD quality audio, in effect, has frames which are only four bytes long (16-bit samples, two channels) but which play back at 44.1kHz. This only gives you 22 microseconds per frame! Of course, the frames are miniscule, but if you miss even one, odds are that the user will hear it. If you did something terrible like take a disk interrupt that took five milliseconds to process, you will hear an ear-rending glitch in the output audio. By contrast, you can drop an entire 17ms frame in 60fps video and it's usually pretty hard to notice.

So, modern OSes don't like very small tasks that have to happen extremely often. The obvious fix is buffering. Instead of generating one frame every 22 microseconds, generate a thousand of them every 22 milliseconds. Now we're back in the realm of video, except that we've added 22 milliseconds of latency to our audio output. This is a lot better, but the consequences of a possible delay are still much worse.

It comes down to a tradeoff between reliability and latency. You can avoid all glitches by using a 10-second buffer, but this will add a great deal of hilarity to iChat voice conferences and games. You can avoid basically all of the latency by using a 22-microsecond buffer, but then you get constant glitches as the OS services interrupts. The right balance is obviously somewhere in the middle.

Avoiding glitches is generally more important than avoiding latency, so most audio systems have fairly high latency. CoreAudio is architected around having as little latency as possible. This influences a lot of other decisions, and results in CA's often-confusing pull model for audio, as well as the fact that CA render callbacks run in realtime threads and are therefore subject to a bunch of annoying restrictions on what they're allowed to do.

These restrictions, by the way, make it very hard to have a good ObjC wrapper around CoreAudio. ObjC message dispatch is generally unsuited for realtime tasks. It's usually fast, but there are certain slow paths that can be taken if caches have been invalidated or you hit a class/selector combo that hasn't been seen before. Hit one of those in your render callback, or worse hit one of those and then smash into a spinlock that's held by a non-realtime thread, and life gets unpleasant very fast.

Overall this architecture is a good thing, as it gives us a high-performance and powerful audio layer. The problem is that there isn't a decent abstract layer above it for applications which just want to play back some music and don't care if it takes half a second to get to the speakers, and there's no abstract layer for the non-realtime components like effects and decoding. You can use QuickTime and NSSound and so forth for a lot of it, but they don't cover it all.

So in conclusion, yes, CoreAudio is hard, but this is at least partially justified. If you're really serious about doing audio on OS X, the time you spend tearing your hair out over it will not be misspent. If your needs are basic, see if you can use QuickTime or even OpenAL. If you want to do something like change the default system output device and you don't already know CoreAudio, curse the gods and resign yourself to slogging through it. Oh, and look for sample code, Apple has a fair amount for CA.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:


ObjC message dispatch is generally unsuited for realtime tasks.


This is not fundamental to Objective-C message dispatch, it’s fundamental to Apple’s current Objective-C runtime implementation. NeXTstep had device drivers written in Objective-C. (See http://www.channelu.com/NeXT/NeXTStep/3...).
From the title, I thought you were going to talk about the documentation and C++ sample code…
Very enlightening.
Chris, I’m not sure exactly what that means, but it’s an interesting data point in any case. Device drivers don’t necessarily have to be realtime, although obviously in the case of audio you’d run into similar problems. One very large difference between the CoreAudio case and the driver case is that the drivers you cite ran in kernel mode, and ran in a completely separate task. This means that there was no chance of any sort of priority inversion such as what could happen if a CA realtime thread hit a spinlock held by a regular thread. It also means that it may have been able to replace the spinlocks with evil techniques such as simply disabling context switches and interrupts during the critical section. In any case, the changes to the runtime that allowed Driver Kit to run on ObjC may not be appropriate or sufficient to allow safe ObjC use in realtime userland threads.
There is a guy on the QuickTime team who has written a bunch of books on developing audio API and systems. I have the first book in his series, and I’ve had a chance to look at the second one and it’s also very good. I’d recommend checking these books out if you want to learn more about audio systems, I found the first one to be very interesting reading (ringbuffers, totally cool!):

Audio Anecdotes: http://www.amazon.com/Audio-Anecdotes-To..
Audio Anecdotes II: http://www.amazon.com/Audio-Anecdotes-II..
Sure, audio glitches are easily noticeable. And I’m sure you need some tight engineering for the audio engine to work well and for plugins or other things working at its core.

But wanting to do something really simple (like playing a sinewave, say) still seems unnecessarily complicated through CoreAudio… I always thought it should be sufficient to provide a single callback function for that, but you tend to need plenty of code around that which you need to grab from examples yourself.
For a much simpler solution you should look at the output AudioUnits. It’s very easy to code and if you need to get into the details you can get deep down into CoreAudio with it.
I think the biggest problem is that, although CA is gaining layers (e.g. the File Player, and I’m hoping that e.g. NSSound uses CA under the hood), there’s almost no mobility between layers. Why is it not possible to create an NSSound and then get at its AudioUnit to tell it to only play on the left channel? There’s no way to bridge the various APIs, so you can use the simple ones and only drop lower when you actually need more control.

RANT=ON
In addition, I think the engineers are too much in love with the technical aspects… I was in a presentation by one where “pause/resume” were listed among the “advanced features” (!). Now, if I had the sources to CA, it wouldn’t be so advanced, but from the outside I had to jump through hoops to get that working…

Elementary stuff that half of the users of a sound playing framework would use, that took two sndCommands with Sound Manager takes pages upon pages of code and requires polling and other inefficient approaches…

And don’t get me started on the sample code … most sample code only replicates what Sound Manager would do. I’m not switching to CA to get my sound reduced to two channels… I want 5.1 audio. And the header comments are almost useless and don’t even document the type of the value you pass into a SetParameter or SetProperty call… that is why CoreAudio is hard.
RANT=OFF
A quick note: Port Audio – http://www.portaudio.com/

its cross platform, works in a similar way to Core Audio (its a pull model) however it adds buffering (and thus a little more latency). But makes up for it in ease of use. Its especially useful if all you need is to stream some audio to the speakers (i.e. background music for your game/app) and dont mind about sub 40ms latentcy!
Core Audio isn't hard. You just need to understand the underlying digital audio concepts. If you'd like I can send you my C++ wrapper for CA, four lines, asynchronous playback, skip the technicalities. Audio Fajita. You can contact me through www.bitjelly.com contact info link.

Salut,

Ruben
NeXTStep did indeed support Obj-C, BUT it did not allow Obj-C on the interrupt path. The problem is that it is possible for the method cache to perform an allocation, a big interrupt time no-no. For MacOSX we developed an interrupt safe message mechanism by tweaking the objc runtime. However, in 1997 Apple was de-emphasising ObjC and as a result we decided to switch to Embedded-C++.

The audio priority band is the user-land equivalent to interrupt threads and has the same issues with ObjC. One possible low-latency solution is to ask ObjC's introspection to give you a C function pointer using -[NSObject methodForSelector:] and the function must be standalone, i.e. no method calls.
I don't find the latency argument particularly compelling when were talking about things like offline or non synchronous file operations, or merely constructing AUGraphs of AudioUnits, something that only happens occasionally and can't happen while the graph is rendering, anyways. GStreamer, for example, has Ruby and Python bindings for constructing graphs and they work great. Same thing with Max or PureData.

Also, I second the comment that CoreAudio is atrociously documented. I was looking up AUMIxer documentation and I found a developer on the listserv mentioning that they were working on some,. This message was from 2006.
With regards to efficiency in the audio callback, do c++ method calls have low enough overhead to be acceptable? I'm thinking about a C++ class with a method which gets used as the callback and with properties which the callback can access...
FWIW with the current run-time optimisations, I happily use (very small) Objective-C method calls inside realtime audio stages on multi-track audio on an iPad (inside an MTAudioProcessingTap).

As @gvdl points out you do really need to avoid memory allocation, but if your method calls nothing but your own code you can easily check this. Also as @gvdl points out there are various ways to turn ObjC calls into simpler function calls if need be.

Premature optimisation begone!

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Hosted at DigitalOcean.