Next article: More Fun With Autorelease
Previous article: Hacking C++ From C
Tags: audio coreaudio rant
I wrote this post over in comp.sys.mac.programmer.help, then realized that it would make a pretty decent blog post as well. I have edited it slightly to work as a blog post rather than as a newsgroup post. If you've ever wondered why CoreAudio is so difficult to use and why it can't be simple and easy like CoreImage or CoreVideo, read on.
Playing fullscreen video used to be a really big deal but it just isn't very impressive anymore. The actual video data for 1080p video at 30fps is only about 240MB/sec. With current memory bandwidths being measured in GB/sec, this is still close enough to the ceiling to be interesting, but not enough to be a really hard problem. The main challenge in playing 1080p video is not getting the pixels onto the screen, but decoding the pixels from the incredibly sophisticated and complex compression format.
If you look at video, you generally have a very long time in which to generate a frame. 24fps is cinema quality, which gives you over 40ms per frame to draw it. It's generally accepted that higher framerates are better up to a point, but even 60fps (the normal limit for LCDs) still gives you about 17ms per frame.
CD quality audio, in effect, has frames which are only four bytes long (16-bit samples, two channels) but which play back at 44.1kHz. This only gives you 22 microseconds per frame! Of course, the frames are miniscule, but if you miss even one, odds are that the user will hear it. If you did something terrible like take a disk interrupt that took five milliseconds to process, you will hear an ear-rending glitch in the output audio. By contrast, you can drop an entire 17ms frame in 60fps video and it's usually pretty hard to notice.
So, modern OSes don't like very small tasks that have to happen extremely often. The obvious fix is buffering. Instead of generating one frame every 22 microseconds, generate a thousand of them every 22 milliseconds. Now we're back in the realm of video, except that we've added 22 milliseconds of latency to our audio output. This is a lot better, but the consequences of a possible delay are still much worse.
It comes down to a tradeoff between reliability and latency. You can avoid all glitches by using a 10-second buffer, but this will add a great deal of hilarity to iChat voice conferences and games. You can avoid basically all of the latency by using a 22-microsecond buffer, but then you get constant glitches as the OS services interrupts. The right balance is obviously somewhere in the middle.
Avoiding glitches is generally more important than avoiding latency, so most audio systems have fairly high latency. CoreAudio is architected around having as little latency as possible. This influences a lot of other decisions, and results in CA's often-confusing pull model for audio, as well as the fact that CA render callbacks run in realtime threads and are therefore subject to a bunch of annoying restrictions on what they're allowed to do.
These restrictions, by the way, make it very hard to have a good ObjC wrapper around CoreAudio. ObjC message dispatch is generally unsuited for realtime tasks. It's usually fast, but there are certain slow paths that can be taken if caches have been invalidated or you hit a class/selector combo that hasn't been seen before. Hit one of those in your render callback, or worse hit one of those and then smash into a spinlock that's held by a non-realtime thread, and life gets unpleasant very fast.
Overall this architecture is a good thing, as it gives us a high-performance and powerful audio layer. The problem is that there isn't a decent abstract layer above it for applications which just want to play back some music and don't care if it takes half a second to get to the speakers, and there's no abstract layer for the non-realtime components like effects and decoding. You can use QuickTime and NSSound and so forth for a lot of it, but they don't cover it all.
So in conclusion, yes, CoreAudio is hard, but this is at least partially justified. If you're really serious about doing audio on OS X, the time you spend tearing your hair out over it will not be misspent. If your needs are basic, see if you can use QuickTime or even OpenAL. If you want to do something like change the default system output device and you don't already know CoreAudio, curse the gods and resign yourself to slogging through it. Oh, and look for sample code, Apple has a fair amount for CA.
Comments:
Audio Anecdotes: http://www.amazon.com/Audio-Anecdotes-To..
Audio Anecdotes II: http://www.amazon.com/Audio-Anecdotes-II..
But wanting to do something really simple (like playing a sinewave, say) still seems unnecessarily complicated through CoreAudio… I always thought it should be sufficient to provide a single callback function for that, but you tend to need plenty of code around that which you need to grab from examples yourself.
RANT=ON
In addition, I think the engineers are too much in love with the technical aspects… I was in a presentation by one where “pause/resume” were listed among the “advanced features” (!). Now, if I had the sources to CA, it wouldn’t be so advanced, but from the outside I had to jump through hoops to get that working…
Elementary stuff that half of the users of a sound playing framework would use, that took two sndCommands with Sound Manager takes pages upon pages of code and requires polling and other inefficient approaches…
And don’t get me started on the sample code … most sample code only replicates what Sound Manager would do. I’m not switching to CA to get my sound reduced to two channels… I want 5.1 audio. And the header comments are almost useless and don’t even document the type of the value you pass into a SetParameter or SetProperty call… that is why CoreAudio is hard.
RANT=OFF
its cross platform, works in a similar way to Core Audio (its a pull model) however it adds buffering (and thus a little more latency). But makes up for it in ease of use. Its especially useful if all you need is to stream some audio to the speakers (i.e. background music for your game/app) and dont mind about sub 40ms latentcy!
Salut,
Ruben
The audio priority band is the user-land equivalent to interrupt threads and has the same issues with ObjC. One possible low-latency solution is to ask ObjC's introspection to give you a C function pointer using -[NSObject methodForSelector:] and the function must be standalone, i.e. no method calls.
Also, I second the comment that CoreAudio is atrociously documented. I was looking up AUMIxer documentation and I found a developer on the listserv mentioning that they were working on some,. This message was from 2006.
As @gvdl points out you do really need to avoid memory allocation, but if your method calls nothing but your own code you can easily check this. Also as @gvdl points out there are various ways to turn ObjC calls into simpler function calls if need be.
Premature optimisation begone!
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.
This is not fundamental to Objective-C message dispatch, it’s fundamental to Apple’s current Objective-C runtime implementation. NeXTstep had device drivers written in Objective-C. (See http://www.channelu.com/NeXT/NeXTStep/3...).