mikeash.com: just this guy, you know?

Next article: Friday Q&A 2012-03-16: Let's Build NSMutableDictionary
Previous article: Friday Q&A 2012-03-02: Key-Value Observing Done Right: Take 2
Tags: fridayqna letsbuild objectivec
Friday Q&A 2012-03-09: Let's Build NSMutableArray
by Mike Ash

Collection classes are ubiquitous in Cocoa apps, but also fairly opaque. Today, at the suggestion of Matthew Elton, I'm going to take a look at how NSMutableArray works behind the scenes by building a replacement for it from scratch.

Concepts
Since NSMutableArray is a class cluster, reimplementing it is fairly easy. By subclassing it and then implementing the primitive methods, you end up with a fully functional NSMutableArray that supports all of the functionality of any NSMutableArray. Here are the primitive methods:

    - (NSUInteger)count;
- (id)objectAtIndex:(NSUInteger)index;
- (void)insertObject:(id)anObject atIndex:(NSUInteger)index;
- (void)removeLastObject;
- (void)removeObjectAtIndex:(NSUInteger)index;
- (void)replaceObjectAtIndex:(NSUInteger)index withObject:(id)anObject;


It should be pretty clear how these primitive methods provide enough functionality for all the other NSMutableArray methods to be built on. In fact, these primitive methods are already a bit excessive. For example, addObject: can be implemented in terms of insertObject:atIndex:, and removeLastObject can be implemented in terms of removeObjectAtIndex:. I'm not entirely clear just why these extra primitive methods exist, but in any case, implementing this full set will give us a fully functional NSMutableArray subclass.

The question is then, how do we implement these methods? The idea is to use a C array as the backing storage. Fundamentally, C arrays only support getting and setting values, whereas NSMutableArray allows for adding and removing values, which automatically shifts later values around. We'll handle this by copying array entries as necessary. NSMutableArray also allows for dynamic expansion of the array, which we'll handle by allocating a new C array and copying the old contents into it when necessary.

Code
As usual, the code for today's escapades is available on GitHub if you want to see it all in one place:

https://github.com/mikeash/MACollections

Implementation
The interface for this class is extremely simple:

    @interface MAMutableArray : NSMutableArray
@end


This subclass doesn't add anything new to the interface, so it's just a direct NSMutableArray subclass.

Here are the instance variables that this class will use:

    @implementation MAMutableArray {
NSUInteger _count;
NSUInteger _capacity;
id *_objs;
}


The C array is _objs, and _capacity holds the number of elements allocated for the array. The number of elements actually used is stored in _count. The capacity and count are stored separately so that the array doesn't need to be reallocated every time an object is added or removed, which would be extremely inefficient.

Next up is the initializer. NSMutableArray requires an implementation of initWithCapacity: in order for the class to work with built-in convenience methods like +array. As far as I know, this is not documented anywhere. For this class, all instance variables can be left at zero, so this initializer doesn't really need to do anything but call super:

    - (id)initWithCapacity: (NSUInteger)capacity
{
return [super init];
}


Note that the capacity argument is purely advisory, so it's safe to ignore like this.

Next comes the implementation of dealloc. When the array is destroyed, it needs to release all of the objects it contains. While there are faster ways to implement this, I'll implement this by simply removing all of the objects. Besides that, the C array itself needs to be freed, and that's it:

    - (void)dealloc
{
[self removeAllObjects];
free(_objs);
[super dealloc];
}


Now we can move on to the implementation of the various primitive methods. The first one is count, which is really easy since it can just return the _count instance variable:

    - (NSUInteger)count
{
return _count;
}


Next is objectAtIndex:, which is nearly as simple. It simply fetches the element out of the C array and returns it:

    - (id)objectAtIndex: (NSUInteger)index
{
return _objs[index];
}


This code omits error checking in the interest of brevity and simplicity. A proper implementation of this method would first check index against _count and throw an exception if it's out of bounds. As it is here, a bad value for index will cause this method to return junk or crash.

Insertion
As mentioned previously, the addObject: method can be implemented in terms of insertObject:atIndex:, so that is what I do:

    - (void)addObject:(id)anObject
{
[self insertObject: anObject atIndex: [self count]];
}


With insertObject:atIndex:, we finally get to some interesting code. The first thing this method needs to do is resize the backing C array if the current one is full, in order to make room for the new object:

    - (void)insertObject: (id)anObject atIndex: (NSUInteger)index
{
if(_count >= _capacity)
{


The code must calculate a new capacity to allocate. I use a minimum of 16 elements, which is used for the very first allocation, and then increase the capacity by a factor of 2 for each additional reallocation. Multiplying the capacity by a constant factor for each reallocation has some useful performance consequences which I'll discuss a little later. Here's the code that calculates the new capacity and then allocates the new array with this new capacity:

            NSUInteger newCapacity = MAX(_capacity * 2, 16);
id *newObjs = malloc(newCapacity * sizeof(*newObjs));


Next, we need to copy the contents of the old array across. This is a simple memcpy:

            memcpy(newObjs, _objs, _count * sizeof(*_objs));


Finally, the old storage is freed, and the instance variables reassigned to the new storage:

            free(_objs);
_objs = newObjs;
_capacity = newCapacity;
}


At this point, the array is guaranteed to have enough room for the new object. If the array was previously full, it has now been expanded. If it wasn't full, then it necessarily has room. With that condition satisfied, the next requirement is to move objects around to open up the slot at index for the new object. All of the objects in the array after index need to shift down by one. This is done with a call to memmove:

        memmove(_objs + index + 1, _objs + index, ([self count] - index) * sizeof(*_objs));


For the unfamiliar, memmove is a potentially slower variant of memcpy which allows its arguments to overlap. memcpy is allowed to assume that its arguments are distinct, and can misbehave if that's not true. This code will have overlapping arguments nearly always, and memmove allows that to work.

Let's unpack these arguments briefly. _objs + index is the source address for the memmove, and is the beginning of the objects that need to be moved. The number of objects that need to be moved is [self count] - index, and multiplying that by sizeof(*_objs) gives the total size of these pointers in bytes. Finally, they need to be moved down by one slot, so the destination pointer is _objs + index + 1.

With room made for the new object, all that remains is to put it into the C array and increase the _count:

        _objs[index] = [anObject retain];

_count++;
}


Removal and Replacement
The removeLastObject primitive method is implemented by just calling removeObjectAtIndex: and passing the last index in the array:

    - (void)removeLastObject
{
[self removeObjectAtIndex: [self count] - 1];
}


The implementation of removeObjectAtIndex: is also relatively simple. The first thing is to release the object being removed:

    - (void)removeObjectAtIndex: (NSUInteger)index
{
[_objs[index] release];


Next, shift all of the objects down with memmove. This is essentially the same as the object shifting from insertObject:atIndex:, but in the opposite direction:

        memmove(_objs + index, _objs + index + 1, ([self count] - index - 1) * sizeof(*_objs));


All that remains is to decrement the object count:

        _count--;
}


This method should also have some error checking in a real implementation to ensure that index is valid, but once again we skip it for this example.

A real implementation of this method would probably also shrink the C array if the count falls under some threshold. If you add a million objects to an array and then remove all of them, you don't want the array still holding memory for a million objects. However, since this isn't strictly necessary and is really just a memory-usage optimization, I have omitted it. The code would look much like the reallocation code in insertObject:atIndex:.

The implementation of replaceObjectAtIndex:withObject: is also really simple. All it does is retain the new object, release the old object, and then place the new object into that index in the array:

    - (void)replaceObjectAtIndex: (NSUInteger)index withObject: (id)anObject
{
[anObject retain];
[_objs[index] release];
_objs[index] = anObject;
}


And that's it! We now have a fully functional NSMutableArray built from scratch.

Reallocation Cost
When reallocating the array, the code increases the capacity by a factor of two each time. This may be an unintuitive approach compared to, say, simply increasing the capacity by a fixed amount each time. Let's examine the performance consequences of these approaches to see why multiplying the capacity by a constant factor is generally better.

First, let's look at the cost incurred in performing a single array reallocation. We can assume that the allocation itself (and freeing the old array) take about the same amount of time no matter how big the array is. The only variable cost is copying the objects from the old array to the new array. Since copying doesn't do anything special, and just moves bytes around one by one, we can assume that if there are $$\mathrm{n}$$ objects in the array, the copy will take an amount of time that's approximately proportionate to $$\mathrm{n}$$, or in algorithmic complexity terms, this operation is $$\mathrm{O(n)}$$. Knowing that each allocation is $$\mathrm{O(n)}$$ in the size of the array so far, we can then figure out the complexity of different reallocation strategies.

Let's consider the case where we increase the array by a fixed amount each time. To make the analysis easy, let's say that the array's capacity is increased by $$\mathrm{1}$$ each time. This is a pathological case, but it's easy to analyze, and the results carry over to more sane cases. Given this, let's look at the total cost of adding $$\mathrm{1000}$$ elements to the array.

Every new element that's added requires a reallocation and incurs a cost proportional to the number of elements in the array so far. The first element costs approximately $$\mathrm{1}$$, the second one approximately $$\mathrm{2}$$, and so on. The total cost is then $$\mathrm{1 + 2 + 3 + \cdots + 999 + 1000}$$, or $$\mathrm{500500}$$.

More generally, the cost to add $$\mathrm{m}$$ elements will be $$\mathrm{1 + 2 + 3 + \cdots + m}$$, or: $$\frac{m * (m + 1)}{2}$$ In algorithmic complexity terms, this can be expressed simply as $$\mathrm{O(m^2)}$$. In other words, the total cost to add m elements is proportional to the square of m. Double the number of elements, and the time to add them will roughly quadruple. Adding a million elements will take roughly a trillion times longer than adding one element, which is not a very happy number.

Let's take a more realistic case, like increasing the capacity by $$\mathrm{1024}$$ for each reallocation. In this case, adding $$\mathrm{m}$$ elements will cost $$\mathrm{1024 + 2048 + 3076 + 4096 + \cdots + m}$$. If you work it all out, this still turns out to be O(m2), although it's still much faster than before. The cost is still proportional to the square of the number of elements, but the cost is smaller by a large constant factor. This is a much more practical way to implement growing the array, but it still has poor asymptotic behavior.

Finally, let's look at the strategy implemented here of doubling the capacity for each reallocation. In this case, the total cost for m elements will be approximately $$\mathrm{1 + 2 + 4 + 8 + 16 + \cdots + m}$$, which is just $$\mathrm{2m - 1}$$, or $$\mathrm{O(m)}$$. In short, the cost to add elements is proportional to the total number of elements added. Add twice as many elements, and it takes about twice as much time.

Interestingly, this holds for other factors as well. For example, let's say that we increase the capacity by 10% each time. This is a little bit harder to work with, since the capacity can only be whole numbers. But if we put that aside for a moment and assume we can deal with fractions, then adding $$\mathrm{m}$$ elements costs $$\mathrm{1 + 1.1 + 1.21 + \cdots + m}$$, which works out to about $$\mathrm{11m}$$, which is still $$\mathrm{O(m)}$$, just with a larger constant factor. In general, any factor $$\mathrm{>1}$$ will result in a net $$\mathrm{O(m)}$$ cost to insert $$\mathrm{m}$$ elements. However, the larger the factor, the lower the constant multiplier in front, so the faster the allocation will perform.

We end up with an interesting tradeoff. Using larger factors makes the array perform better, but wastes more space. When doubling the allocation, if the array holds $$\mathrm{m}$$ elements, there may be space allocated for up to $$\mathrm{m - 2}$$ additional elements that isn't being used. If the array continues to grow then it will eventually be used, but if it stops at that point, that space is perpetually wasted. With a 10% allocation increase, at most about $$\mathrm{\frac{m}{10}}$$ space can be wasted, but the time spent in copying and reallocating becomes considerably greater.

Conclusion
The real implementation of NSMutableArray is considerably more complicated that what's presented here, but the basic principles remain. Now that you see one simple implementation, I hope that this makes the whole concept more clear.

Next time, I'll take this a step further and show an implementation of NSMutableDictionary based on a hash table. Until then, your ideas are always welcome for future articles, so send them in!

I'm not entirely clear just why these extra primitive methods exist

According to that old documentation, NSMutableArray's methods were originally based on addObject:, replaceObjectAtIndex:withObject: and removeLastObject. Then, insertObject:atIndex: and removeObjectAtIndex: were added as primitive methods for performance reasons.

http://www.gnustep.org/resources/OpenStepSpec/FoundationKit/Classes/NSMutableArray.html

How interesting. That seems like an astoundingly bad choice of primitive methods initially. I wonder how that came about.

replaceObjectsInRange:withObjects:length: would have been a nice single primitive method.

I've always wondered what (if any) black magic performance tricks NSMutableArray and friends do -- it would be interesting to do some performance comparisons.

Thanks for the excellent post and the link to ridiculous fish piece, which I would never have otherwise seen. Looking forward to NSMutableDictionary.

Perf tests have shown that insertion cost at arbitrary indices is not linear with the number of elements. The Apple implementation appears to use a tree structure when the array size is very large.

Is there a particular reason you're using malloc and copying the old array's memory to the new array rather than using realloc?

I avoid realloc purely because malloc/memcpy/free exposes the inner workings and makes the implementation more educational. In a real implementation you would surely want to realloc instead.

Is there a reason that you used malloc followed by a memcpy when increasing the capacity of the c array instead of using realloc? As I understand it, realloc could be a lot faster here.

Sorry, didn't see that this was asked & answered already as I had the page open from a few days ago and forgot to refresh. :)

In insertObjectAtIndex, rather than doing memcpy followed by memmove, would it be faster to memcpy the objects before index, set the new object at index, and memcopy the group of objects after index?

Yes, that would certainly be faster. However, it makes for two fairly different code paths, rather than a single code path with one extra step in the case of resizing, and so I think it makes the code a bit less clear. In real code you would probably want to do that, but for the purposes of education, I prefer the double copy.

Well, it works but this sort of "container" is dreadfully inefficient compared to something like std::vector<>

This array class was written for educational purposes, not for speed. There are a number of things I would do differently if speed were the primary goal, starting with using realloc instead of manually reallocating memory.

The speed difference rather depends on just what you're storing in it. If you're storing ints, of course std::vector will beat storing a bunch of NSNumber instances, by far. If you're storing an array of Objective-C objects, I doubt you can come up with any non-contrived case where std::vector is both correct (e.g. retains/releases its contents) and measurably faster than this.

Well, I am not talking about boxing cause that by definition sucks regardless of what container are you using.

Retaining and releasing should not be really property of a container.

Why always pay a price for this if you can have the caller determine if that's necessary by wrapping a pointer in some sort of smart pointer ( intrusive if going for speed) and still retain full performance for pointers/native types where you are using vector as a temporary storage unit, while having full retain/release functionality when needed.

Of course we are talking here about different worlds where in C++ you can templatize this stuff and have accessors (objectAtIndex) perform with no speed penalty at all.
Accessors and itteration is where it matters most since that's what will be called (more often than not) in a loop.
Reallocation and mid-insertion will always suck but that's less important because people who care about it, can always preallocate ahead.

std::vector is frankly not something I would use in a really performance critical code ... more likely would roll out my own or use something like this :

( with nice tricks like erase_unordered etcx )

I guess the bottom line is that there is just no chance in hell you can get anything truly performant with Objective C style boxing and non-inlined accessors.
It doesn't mean it doesn't have its place but people tend to just use NSMutableArray blindly for storing/iterating over things like particles and sprites and then wonder on forums while their code slows down once they push more than a hundred items ( and do it 60 times per second)

"Retaining and releasing should not be really property of a container."

That's just badly wrong. Objective-C with Cocoa is built around the idea of reference counting and that requires memory management to be done locally in order to retain sanity. Objective-C is not C++. You may be correct in C++, but you're definitely wrong in Objective-C. A container that doesn't manage its contents' memory would be incredibly error-prone and difficult to use.

Iteration over containers in Cocoa is done using NSFastEnumeration which, when implemented well, has a small constant overhead but is otherwise just as fast per element as direct C array access. It would be just a handful of lines of code to implement NSFastEnumeration on MAMutableArray, at which point a for/in loop will run about as fast as a loop over a C array.

I'd be surprised if iterating over a 100-element NSArray at 60Hz causes a noticeable performance hit compared to a straight C array or other similar container. Have you actually measured such code and seen the array as a bottleneck, or are you just speculating?

Well,that's fine ... if you are looking for consistency then having everything retain everything will work.

On the other hand, if people need to implement performance critical containers they will have to come up with their own implementations because sure as hell MAMutableArray (and related classes) won't do it for them.

And I am not talking here about a simple array of widgets but things more complicated than that.

std:: like containers can be used for both because they leave the ultimate decision in terms of how much performance you need to the programmer but with Objective-C it is one size fits all – if it is not fast enough , you are out of luck.

As far as enumerations , the actual method call , while important, is not really the biggest problem here.

The real issue here is the fact that this type of MAMutableArray works only with objects ( pointers really.)
With std:: containers I am free to store copies of structs/objects which means I am guaranteed that these things will be aligned in memory sequentially.
Given relatively small structs , this makes enumeration incredibly fast, because the real cost is not in the method call itself but in the fact that with pointers you pretty much end up busting CPU cache with every access ( since you have no control which part of memory your objects come from) and on a device like iPad this matters a lot ( frankly, it does on any modern CPU) because, while the actual method call to access the value at specific index may take a dozen or so CPU cycles, a missed cache hit can take hundreds of cycles.
Again this makes no difference with an array containing a few dozen Widgets or something like that but what if you have an array of few thousands particles or sprites or something like that.

I guess the larger point is that with std:: you are free to used them in a “safe” Objective-C like (retail/release) if you want, but there is nothing preventing you from writing code that will make a std:: vector run just about as fast as an ordinary C like array while still retaining all benefits of a well designed and tested collection.

That's all true but I can't say I see what your point is. NSArray is designed as an array of Objective-C objects. Yes, it doesn't do a good job when you try to use it as an array of structs, because that's not what it's designed to do. This is not a surprise.

Use NSArray for arrays of Objective-C objects. Use other containers for arrays of non-objects. That's standard procedure in this environment.

There is a really nice way to implement this using CFArrayCreateMutable and supplying callbacks with NULL retain and release handlers.

http://stackoverflow.com/questions/4692161/non-retaining-array-for-delegates

I've been writing a lot of Go lately. Its "slices" are much like mutable arrays. They also grow by a constant factor - but with a twist. When your array is under 1 kB it doubles the capacity each time, just like yours. But over 1 kB, the constant factor drops from 2 to 1.25. This way small arrays are very fast, while large arrays are more memory-efficient.

To see this in action run my demonstration program:
http://play.golang.org/p/HPyua__wYO

"I'd be surprised if iterating over a 100-element NSArray at 60Hz causes a noticeable performance hit compared to a straight C array or other similar container. Have you actually measured such code and seen the array as a bottleneck, or are you just speculating?"

actually that is not far off - at about 1000 elements or so the array expenses prohibit running at 60...

someone wrote an article about it somewhere in the context of making a game...

http://www.vellios.com/2010/08/22/why-game-devs-dont-use-objective-c/

it makes sense - the considerable power of objective-c's super dynamic and late binding object orientation comes with a significant cost, and it is without the c philosophy of only paying for what you use. performing messaging (member function call) is typically more expensive than a clean C/C++ array lookup before it actually knows where to jump in the code, let alone do anything productive. this prevents NS(Mutable)Array from ever approaching the performance of C arrays.

going off on a tangent...

and @warmi STL is pretty horrendous in its own ways - its speed is inherited from C/C++ itself rather than clever, or good design - and in most cases the performance can be beaten by writing special case code... usually heap allocation is the dominant cost, the benefits of templating, elision and inlining mean next to nothing compared to that.

c++ 11 fixed array helps some, but almost every data structure should be able to live on the stack or in static data areas in a highly optimised way. what would be nice would be good pool allocation classes and an allocator implementation that isn't horrible to use... and save every game dev from writing their own. :)

Semi Essessi: I wrote a speed test to see just how costly it is to iterate over an NSArray:

https://gist.github.com/4260612

On my computer, iterating a 1000-element array and sending one message to each object within takes about 55 microseconds. At 60fps, that iteration will cost you about 1/300th of your available time per frame, which is not going to really affect whether or not you can achieve 16ms/frame.

Yes, other containers can be faster, but a 1000-element NSArray isn't going to hurt you in most cases, not even a 60fps game.