Next article: Friday Q&A 2010-12-31: C Macro Tips and Tricks
Previous article: Friday Q&A 2010-12-03: Accessors, Memory Management, and Thread Safety
Tags: fridayqna memory objectivec
Merry holidays, happy winter, and a joyous Friday Q&A to you all. Camille Troillard suggested that I discuss how to create custom object memory allocators in Objective-C, and today I'm going to walk through how to accomplish this and why you might want to.
What It Means
As anyone who uses Objective-C knows, you allocate an instance of a class by writing [MyClass alloc]
. Creating a custom allocator simply means that replace the standard allocator so that [MyClass alloc]
calls into your own code instead.
An Objective-C object is just a chunk of memory with the right size, and with the first pointer-sized chunk set to point at the object's class. A custom allocator thus needs to return a pointer to a properly-sized chunk of memory, with the class filled out appropriately.
Why It's Useful
By far the largest reason to write a custom allocator is for performance. The standard allocator makes tradeoffs which may not be appropriate for your particular case. It also has to work with every class in every situation, whereas your custom allocator only needs to work with your class and the situations it's used in.
Another reason is overhead. The standard allocator requires a certain amount of extra storage for each allocation for various reasons. This can be particularly expensive for very small objects allocated in very large numbers. A custom allocator can cut down on this overhead substantially by tailoring it to the needs of the class it's written for.
A Note on Garbage Collection
This post assumes manual retain
/release
memory management. Custom allocators are mostly impossible to use under garbage collection, because there is no way to add a custom free
callback. It is possible to use some of these techniques (like an object cache) but for the most part, custom allocators are reserved for the realm of manual memory management.
A Basic Custom Allocator
The +alloc
method actually just calls through to +allocWithZone:
. Although memory zones are pretty much just a historical curiosity at this point, they remain in the API. Thus the method to override is +allocWithZone:
:
+ (id)allocWithZone: (NSZone *)zone
{
calloc
. This will have roughly zero advantages over the standard allocator, but shows how it can be done. (I'm using calloc
instead of malloc
because Objective-C code assumes that instance variables are zeroed out.)
In order to call calloc
, you need to know how much memory to allocate. Fortunately, the Objective-C runtime makes it easy. The class_getInstanceSize
function will tell you exactly this:
id obj = calloc(class_getInstanceSize(self), 1);
*(Class *)obj = self;
return obj;
}
-dealloc
to call free
:
- (void)dealloc
{
free(self);
-dealloc
methods that don't call through to super
. In order to shut up this warning, I insert a dummy call after a return
statement which prevents it from executing:
return;
[super dealloc]; // shut up compiler
}
Gotchas
As with most things at this level, there are a few things to watch out for.
First, don't do this unless you subclass NSObject
directly. The -dealloc
method covers both destroying the object itself, and freeing resources it holds. -[NSObject dealloc]
just destroys the object (mostly) so it's safe not to call it. It's not safe to do this for any other class, though. For example, if you tried this with an NSView
subclass, you'd end up leaking a whole bunch of internal state.
Second, the "(mostly)" from above means there are some things that NSObject
does that you need to think about. One is removing associated objects. If your objects may have associated objects, or you think there's even a chance that it might, then you need to make sure they're removed. This can be done by calling objc_removeAssociatedObjects(self)
. The other is calling destructors for C++ objects in instance variables. Your best bet here is to just avoid having C++ objects as instance variables. If you must have them, look into the possibility of calling or imitating the private runtime function objc_destructInstance
, which takes care of both C++ destructors and associated objects.
Third, memory debugging tools like ObjectAlloc and zombies won't work on objects with a custom allocator. For this reason, I recommend that you have a memory debugging preprocessor define which makes your objects use the standard allocator instead of your custom allocator, so that you can flip the switch and use these tools if need be.
Caching Objects
For a realistic example, I'll write an allocator that places destroyed objects in a cache so that they can be quickly reused. This sort of thing is useful for classes which are allocated and destroyed so frequently that the standard allocator is too slow.
In order to reach maximum speed, I'll make a few assumptions about how this class works and is used:
- It is never subclassed, or if it is, subclasses never add instance variables. (This allows it to put all instances in the same cache.)
- Its initializer methods can deal with a "dirty" object; i.e. the instance variables don't need to be zeroed out. (This saves time zeroing out each instance when pulling it out of the cache.)
- It is only ever allocated and destroyed from the same thread. (This makes it unnecessary to create a thread-safe cache.)
AddObjectToCache
and GetObjectFromCache
. The +allocWithZone:
override then looks like this:
+ (id)allocWithZone: (NSZone *)zone
{
id obj = GetObjectFromCache();
if(obj)
*(Class *)obj = self;
else
obj = [super allocWithZone: zone];
return obj;
}
-dealloc
override simply returns the object to the cache:
- (void)dealloc
{
// release any ivars here
AddObjectToCache(self);
// shut up the compiler
return;
[super dealloc];
}
isa
slot of each object to point to the next entry in the list. The list head is a global variable:
static id gCacheListHead;
next
pointer of each list item:
static id GetNext(id cachedObj)
{
return *(id *)cachedObj;
}
static void SetNext(id cachedObj, id next)
{
*(id *)cachedObj = next;
}
static id GetObjectFromCache(void)
{
id obj = gCacheListHead;
if(obj)
gCacheListHead = GetNext(obj);
return obj;
}
static void AddObjectToCache(id obj)
{
SetNext(obj, gCacheListHead);
gCacheListHead = obj;
}
Custom Block Allocator
Caching objects can be a big speed boost, but the initial allocations are not accelerated, and you still have the space overhead of all of those small allocations. By allocating a large block of memory and chopping it up into chunks, it's possible to speed up the initial allocations and vastly decrease the per-object overhead. To do this, I'll use the same object cache scheme as above, but with a modification to the +allocWithZone:
implementation:
+ (id)allocWithZone: (NSZone *)zone
{
id obj = GetObjectFromCache();
if(!obj)
{
AllocateNewBlockAndCache(self);
obj = GetObjectFromCache();
}
*(Class *)obj = self;
return obj;
}
AllocateNewBlockAndCache
. The first thing this function will do is allocate a large block of memory. I chose 4096
for the block size as it matches the page size used by OS X and is a convenient number to work with:
static void AllocateNewBlockAndCache(Class class)
{
static size_t kBlockSize = 4096;
char *newBlock = malloc(kBlockSize);
class_getInstanceSize
to mark off each instance-sized section, and then use AddObjectToCache
to get each section into the cache:
int instanceSize = class_getInstanceSize(class);
int instanceCount = kBlockSize / instanceSize;
while(instanceCount-- > 0)
{
AddObjectToCache((id)newBlock);
newBlock += instanceSize;
}
}
Conclusion
Writing a custom object allocator in Objective-C is relatively simple. The hard part is the allocator itself, which is largely up to you. Once you have the allocator, you can plug it into your Objective-C class by:
- Overriding
+allocWithZone:
to call your custom allocator, set theisa
of the block toself
, and optionally zero out the rest of the memory. - Overriding
-dealloc
to call your custom allocator, and do not call through tosuper
. - Calling
objc_removeAssociatedObjects
in-dealloc
if there's a chance of your object containing associated objects. - Only subclassing
NSObject
directly, and not subclassing any subclass ofNSObject
.
That's it for this edition of Friday Q&A. Come back in two weeks for the next exciting edition. As always, your ideas for topics to cover are welcome and requested, so if you have something that you would like to see covered here, please send it in!
Comments:
*(Class *)obj = [MyObject class];
return obj;
+allocWithZone:
. It could go either place, but putting it there makes it easier to deal with (same-sized) subclasses.OSAtomicEnqueue()
and OSAtomicDequeue()
. Basically,
replace
static id gCacheListHead;
with
static OSQueueHead gCacheListHead = OS_ATOMIC_QUEUE_INIT;
then replace
static id GetObjectFromCache(void) { }
with
static id GetObjectFromCache(void) { return(OSAtomicDequeue(&gCacheListHead, offsetof(id, isa))); }
and finally replace
static void AddObjectToCache(id obj) { }
with
static void AddObjectToCache(id obj) { OSAtomicEnqueue(&gCacheListHead, obj, offsetof(id, isa)); }
Completely untested, but that's the general idea. This will provide a thread-safe atomic LIFO queue, even when multiple CPU's are concurrently adding and removing items.
Also, while it doesn't make a difference, calloc's API is:
void * calloc(size_t count, size_t size);
You've flipped the arguments:
id obj = calloc(class_getInstanceSize(self), 1);
It should read:
id obj = calloc(1, class_getInstanceSize(self));
Cheers,
Nathan de Vries
As for calloc, I never pay attention to the arguments, and just leave one as 1. The fact that the zeroing and non-zeroing allocation calls take different size arguments has never made any sense to me....
I'd always assumed that using malloc + memset was functionally equivalent to using calloc. Not true, it seems!
[1] http://developer.apple.com/library/mac/#documentation/Performance/Conceptual/ManagingMemory/Articles/MemoryAlloc.html
AllocateNewBlockAndCache
, namely that you use the size returned by class_getInstanceSize(class)
to "slice" up the allocation. To the best of my knowledge, the size returned by class_getInstanceSize()
is not rounded up to the ABI required minimum boundary for correct alignment.
The allocation returned by
malloc()
is required to return an allocation that is guaranteed to be correctly aligned for any type. Correct alignment is (probably) not guaranteed as it is currently written. Whether or not this is a problem depends on the architecture. It will probably work on x86, but not on RISCy CPUs.sizeof(void *)
to be safe. *(id *)cachedObj
Thanks !
This seems to not be possible with recent versions of obj-c. In particular, the run-time keeps track of whether or not [NSObject release] has been called on an object. See, e.g.:
http://www.opensource.apple.com/source/objc4/objc4-551.1/runtime/NSObject.mm
and in particular
bool
_objc_rootReleaseWasZero(id obj)
where the SpinTable keeps track of whether or not release has been called (it sets SIDE_TABLE_DEALLOCATING). I have not tried to work around this, but certainly it's possible to call another function than release to send the object back to the cache.
objc_destructInstance
. Fortunately, this call has been made public since I wrote this article, and is now the officially supported way to destroy an object while managing the memory yourself. objc_constructInstance
is probably also a good idea to use. If using the two of those still runs into trouble, I'd encourage filing a bug. This is something the runtime guys do care about (I think my prodding helped get these calls made public, and it didn't strike me as their intent to break custom allocations in the first place when I talked to them) and I think they'll get it fixed if it's broken.Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.