mikeash.com: Friday Q&A 2010-08-27: Defensive Programming in Cocoa

Posted at 2010-08-27 15:19 | RSS feed (Full text feed) | Blog Index
Next article: Cocoa Unbound
Previous article: Friday Q&A 2010-08-12: Implementing NSCoding
Tags: cocoa defensive fridayqna objectivec

Friday Q&A 2010-08-27: Defensive Programming in Cocoa

by Mike Ash

Welcome back to another word-laden edition of Friday Q&A. About a year ago, I wrote a post on defensive programming. That post covered defensive programming in a general sense, and Scott Gould has requested that I write one specific to various standard Cocoa practices, which is what I will be talking about today.

Recap
Defensive coding essentially boils down to constantly asking yourself, "what if it fails?" and coding appropriately. Your code's response to failure can be ranked:

Corrupt/delete user data
Crash/freeze
Fail silently
Display an error
Work around the failure

The goal is to get as far down the list as is possible and reasonable.

When it comes to defensive Cocoa programming, a lot of failures are really just unanticipated changes. What happens if something makes a subclass of your class? What happens if your superclass's dealloc implementation changes? What happens if the behavior of an object changes but remains within the API contract?

Initializers
What better place to start than at the beginning of an object's lifecycle?

I wrote a full post on implementing Cocoa initializers, so I won't go into too much detail here. The main things you need to do to write a defensive initializer are to always ensure that your superclass's initializer gets called (even if it doesn't do anything, it could start doing something later), always assign self to the result, and always check the result for nil. Thus:

    - (id)init
    {
        self = [super init];
        if(self)
        {
            // ... initialize instance variables here
        }
        return self;
    }

Deallocation
When your object is destroyed, it must clean up after itself. Naturally, you need to be careful when doing this.

First, always write your dealloc implementation to tolerate an uninitialized or partially initialized object. It's possible that your initializer, or a superclass initializer, will encounter an error and decide to destroy your object before it's fully formed, and your code should tolerate this. You can take advantage of the fact that Objective-C objects start out zero-filled. For example:

    - (void)dealloc
    {
        // if statement guards against uninitialized object
        if(someStructPointer)
            [someStructPointer->object release];
        
        [super dealloc];
    }

Note that most of the time, you can simply take advantage of the fact that messages to nil do nothing, and not have to write any extra code to handle the case of an uninitialized object.

Setters
The same thing goes for setters, especially those which are overridden from a superclass. The superclass may call the setter with nil to destroy an object rather than directly using release. Always write your code to tolerate this. For example:

    - (void)setFoo: (id)newFoo
    {
        // make sure we don't do something bad if newFoo is nil!
        if(newFoo)
            [[NSNotificationCenter defaultCenter]
             addObserver: self
             selector: @selector(_fooChanged)
             name: FooDidChangeNotification
             object: newFoo];
        [super setFoo: newFoo];
    }

You should always do this even if you're sure it will never get called with nil. It's just good practice, and doesn't hurt.

+initialize
Just as you have to code defensively when initializing your objects, so do you have to for initializing your classes.

The +initialize method is extremely useful for doing basic setup of class-wide data. It's invoked automatically by the runtime the first time any message (such as alloc) is sent to your class.

For example, say you need a global dictionary to hold certain items:

    static NSMutableDictionary *gDictionary;
    
    + (void)initialize
    {
        gDictionary = [[NSMutableDictionary alloc] init];
    }

However, this is dangerous and wrong! The trouble occurs if there's a subclass which doesn't implement +initialize. The runtime still sends the message, and so it ends up invoking your version instead. Suddenly you've leaked the old dictionary and lost all of the data in it. Whoops.

The fix is simple: just check the identify of self before you do your initialization.

    static NSMutableDictionary *gDictionary;
    
    + (void)initialize
    {
        if(self == [MyClass class])
            gDictionary = [[NSMutableDictionary alloc] init];
    }

As a general rule, the first line of any +initialize implementation should always be a check of self to ensure that it's the correct class.

You might be thinking that you didn't write any subclasses and don't plan to, so you don't need to do this. There are two problems with that approach.

First is the obvious one that you might simply be wrong about your future plans. If in a year you change your mind and decide that you do need a subclass, you don't want this code to suddenly break in mysterious and hard-to-debug ways.

Less obvious is the fact that subclasses of your class can be created dynamically at runtime. This is done by Cocoa's key-value observing system, a key component of Cocoa bindings. These dynamic subclasses still cause +initialize to execute, so be prepared for it.

Memory Management
If you have an object pointer instance variable, never assign to that variable except in init, dealloc, and a setter method. In other words, don't write code that does this directly:

    [myObjIvar release];
    myObjIvar = [[self makeNewObj] retain];

If you write that code directly, it's easy to forget a retain or release and cause havoc. It makes your code less dangerous and easier to understand to call through to a setter:

    [self setMyObjIvar: [self makeNewObj]];

That way all of the memory management code is contained in just one place, instead of being scattered all about.

On the subject of setters, if you're writing your own setter, remember that you must either do an if check to make sure that the object you have is genuinely new, or you must do a retain (or copy) before you do a release, to make your code robust against calling the setter with the same object as is already held in the variable. In other words, don't do this:

    - (void)setMyObjIvar: (id)obj
    {
        [myObjIvar release]; // could destroy obj!
        myObjIvar = [obj retain];
    }

Instead, either do this:

    - (void)setMyObjIvar: (id)obj
    {
        if(obj != myObjIvar)
        {
            [myObjIvar release];
            myObjIvar = [obj retain];
        }
    }

Or this:

    - (void)setMyObjIvar: (id)obj
    {
        [obj retain]; // or obj = [obj copy];
        [myObjIvar release];
        myObjIvar = obj;
    }

Which one is better is essentially a matter of taste.

Copying
Object copying is a source of horrors when it comes to subclassing Cocoa objects, due to the two different ways that exist to create a copy of an object in the top-level implementation of copyWithZone:.

The sane way to implement copyWithZone: is to either return [self retain] (only if the object is immutable) or to create a new instance of the object's class and populate it to hold the same values using accessors or direct instance variable access:

    - (id)copyWithZone: (NSZone *)zone
    {
        // note use of [self class] rather than MyClass
        // this is defensive programming against subclassing!
        MyClass *newObj = [[[self class] alloc] init];
        [newObj setFoo: _foo];
        [newObj setBar: _bar];
        return newObj;
    }

The completely insane way to implement copyWithZone: is to use the NSCopyObject function. This function allocates a new object of the same class and returns it. The problem is that it also performs a bitwise copy of all instance variables. These instance variables include things like pointers to objects. It does not retain them, merely copies their value.

Never, ever, ever use NSCopyObject. Easy, right? The problem is that Cocoa uses it, and if you ever subclass Cocoa objects, you have to be aware.

Even worse: you often can't count on the Cocoa implementation using one technique or another. It could use either one, and which one it uses could switch at any time. So if you ever subclass a Cocoa object that implements NSCopying, and you add instance variables, you need to write a copyWithZone: override that handles both cases correctly. Fortunately this is easier than it sounds.

First, let's examine the problem in more detail. Here's an example NSCell subclass:

    @interface MyCell : NSCell
    {
        NSString *_someString;
    }
    
    - (void)setSomeString: (NSString *)newString;
    
    @end

    @implementation MyCell
    
    - (void)setSomeString: (NSString *)newString
    {
        [newString retain];
        [_someString release];
        _someString = newString;
    }
    
    - (void)dealloc
    {
        [_someString release];
        [super dealloc];
    }
    
    @end

Now you write a bit of code that uses it:

    MyCell *cell = [[MyCell alloc] init];
    [cell setSomeString: string];
    MyCels *cell2 = [cell copy];
    [cell release];
    [cell2 release];

You run this and it crashes! Why?

The implementation of -[NSCell copyWithZone:] uses NSCopyObject. This means that it copies the _someString pointer, but does no memory management on it. You end up with two pointers to the string but only one of which is retained. They both get released in dealloc. Crash.

The solution is simple: override copyWithZone: and retain or copy the instance variable:

    - (id)copyWithZone: (NSZone *)zone
    {
        MyCell *newObj = [super copyWithZone: zone];
        [newObj->_someString retain];
        return newObj;
    }

This works, but is brittle. If the NSCell implementation changed to no longer call NSCopyObject, this implementation will fail, because newObj->_someString will be nil. It won't crash, but the copy won't be a proper copy either. And you can't just switch to using [newObj setSomeString: _someString] because that crashes the NSCopyObject case. We need code that works equally well for both.

The answer is to directly assign to the instance variable of the other object and do memory management at the same time, like so:

    - (id)copyWithZone: (NSZone *)zone
    {
        MyCell *newObj = [super copyWithZone: zone];
        newObj->_someString = [_someString retain];
        return newObj;
    }

You'll note that it works for both cases. For the sane case, it retains the string and puts the value into the new object's instance variable. For the NSCopyObject case, it overwrites the existing pointer with a new one, but without releasing the old one.

Note that NSCell is by far the most commonly subclassed class that conforms to NSCopying. If you ever subclass NSCell or one of its subclasses, you must implement copyWithZone:, and you must do it correctly using the above technique. Otherwise you leave yourself open to extremely mysterious crashes and corruption problems. I've been there, it's no fun.

Error Returns
A lot of Cocoa methods take NSError ** parameters to tell you why something went wrong. Always check the retun value from these methods! And always at least log the error if there is one.

Don't do this:

    NSString *string = [NSString stringWithContentsOfFile: path encoding: NSUTF8StringEncoding error: NULL];
    [self corruptImportantDataIfNil: string];

Instead, check string and use the error if it's nil. At the very least, use an assert to nicely stop the current operation rather than continuing with bad data:

    NSError *error;
    NSString *string = [NSString stringWithContentsOfFile: path encoding: NSUTF8StringEncoding error: &error];
    NSAssert(string, @"Could not load string, error: %@", error);
    [self corruptImportantDataIfNil: string];

If possible and reasonable, try to either fail gracefully (maybe try a different, backup source of data) or at least report the error to the user in the GUI and allow the program to continue executing normally.

Always at least check for failure and log the error. Even if subsequent code won't corrupt data in the error case, it's still much easier to figure out why your code isn't working if you can catch the failure as early as possible.

A Note on Checking Errors
This has been repeated in many other places, but bears another mention. Always check the return value of such a method, and not the NSError variable directly. For example, this is wrong:

    NSError *error = nil;
    NSString *string = [NSString stringWithContentsOfFile: path encoding: NSUTF8StringEncoding error: &error];
    NSAssert(!error, @"Could not load string, error: %@", error);

Apple reserves the right to fill your error variable with junk upon success. This is a stupid policy, but it's the policy which is there, so you must take it into account.

Weak References
The standard way to set up an NSTableView using a data source is to create an object to be the data source, implement the required methods, and then hook up the dataSource outlet of the table view. However, if you don't do anything else, this is actually a dangerous setup.

The problem is that the order of object destruction isn't defined. If your data source gets destroyed before the table view, the table view could potentially message the data source after it's been destroyed, causing your program to crash.

To ensure that this never happens, you must zero out the data source in your dealloc. However, simply adding that isn't safe either! It's possible for the table view to be destroyed first, and then your message to it will crash. What to do?

The answer to this conundrum is to retain the table view outlet. Don't rely on direct instance variable twiddling, but rather write a setter for it that retains it. Then you can write your dealloc like this:

    - (void)dealloc
    {
        [tableView setDataSource: nil];
        [tableView release];
        [super dealloc];
    }

This is safe because it ensures that the table view is never destroyed before you zero out the weak reference.

Most weak references are subject to this problem, including most delegate references from Cocoa objects. Most of the time you can get away with ignoring it, but it's much safer to do it right.

For a more comprehensive solution to the problem of weak references, check out MAZeroingWeakRef. By using the MAZeroingWeakProxy class and the Objective-C associated object API, it becomes trivial to create delegate and data source connections which are completely safe.

Categories
Categories are a great feature of Objective-C. It's really handy to be able to add new methods to Cocoa classes that you can then use anywhere in your program. However, it can also be dangerous.

The danger comes when your method name clashes with one that Apple has put in the class. Your method will override the existing one. If they do different things, trouble will ensue. Even worse, your category method could be just fine today, but conflict with a method added by Apple in the next release of the OS.

For example, say you add a map: method to NSArray:

    @interface NSArray (MyAdditions)
    
    - (NSArray *)map: (id (^)(id obj))block;
    
    @end

This is fine now, but if Apple adds their own map: method in 10.7, and it doesn't have identical semantics, you'll be in big trouble.

The only way to avoid this is to ensure that your category methods on Apple classes will never have a name conflict. The most obvious way to ensure this is to add a prefix to the method name:

    - (NSArray *)ma_map: (id (^)(id obj))block;

This is ugly, but effective. As a bonus, it's often easier to find your category methods with Xcode's code completion, since you can just type in your prefix and let Xcode show all of them.

Another way to ensure that you never have a conflict is to give your method an extremely specific name that you can be confident will never be used by Apple. For example:

    - (NSArray *)arrayByMappingMyFooElements: (id (^)(MyFoo *foo))block;

But this can be a dangerous game to play. When in doubt, prefix.

Conclusion
Cocoa is a complex system that can have some hidden gotchas. However, for the most part, all you need to do is be aware of the fact that your application is really just one component in a larger system. Write your application to be a good citizen and cooperate nicely with the other components that get loaded into your process. Be particularly careful when subclassing and when making modifications to framework classes. The cases presented above are just a sampling of useful defensive programming techniques in Cocoa. Keep defensive programming on your mind no matter what you write, and your code will be better for it.

That's it for this round of Friday Q&A. I will probably be taking a few weeks off for (good) personal reasons, so don't panic if two weeks go by and I haven't posted another one. I will resume before too long, but won't make any promises as to when just yet.

As always, if you have ideas for topics that you would like to see covered here, send them to me. I may not get to it right away, but I will certainly put your suggestion on my list for the future.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Scott Gould at 2010-08-27 16:20:15:

Thanks so much for the post!

There is a lot of practical wisdom here. I know the future me really appreciates all of the debugging time and anguish you have likely saved him. Enjoy your time off.

Dave DeLong at 2010-08-27 16:20:19:

Regarding category name collision:

I personally prefer to use a method suffix instead of a method prefix. That way I can still name the method something logical and find it in the autocompletion list, without having to remember that it's a category method. So instead of doing:

- (void) dd_map:(id (^)(id obj))block;

I do:

- (void) map_dd:(id(^)(id obj))block;

Patrick Stein at 2010-08-27 16:40:48:

Two things:
1st:

I use the

- (id)init
{
    if(! (self=[super init]) )
    {
        return nil;
    }

    // put your code herebut remember

    if( somethingfails )
    {
        [self release];
        return nil;
    }

    return self;
}

this makes it easier to understand in my opinion as you don't have a mile long open/close bracket if your init routine is long. You don't have to keep in mind that open bracket the whole time.

Furthermore remind people to use [self release] in the init method once they have self filled with something but can't initialize the object anyways.

2nd: the link to http://www.mikeash.com/pyblog/the-how-and-why-of-cocoa-initializers.html is broken as it contains a URI which is too long.

Rob Wills at 2010-08-27 17:02:40:

Another defensive tip for Cocoa is avoiding underscore prefixes.. Apple's documentation claims that it has reserved the use of the leading underscore:

Avoid the use of the underscore character as a prefix meaning private, especially in methods. Apple reserves the use of this convention. Use by third parties could result in name-space collisions; they might unwittingly override an existing private method with one of their own, with disastrous consequences.

(http://developer.apple.com/iphone/library/documentation/Cocoa/Conceptual/CodingGuidelines/CodingGuidelines.pdf page 11)

mikeash at 2010-08-27 18:39:31:

Patrick Stein: Thanks for the note about the broken link, that was just a silly typo. Fixed now.

Rob Wills: I've always found that warning to be a bit strange. Using an underscore prefix is at the very least no more dangerous than not using a prefix at all. Apple can and does add un-prefixed methods to their classes, so unprefixed names can clash too. Adding an underscore at the very least does not increase the probability of breaking something in the future.

Scott Gould at 2010-08-27 19:14:36:

mikeash: Using an underscore prefix is at the very least no more dangerous than not using a prefix at all.

I would think colliding with a public Apple method would be preferable to colliding with a private one. In the case of colliding with an underscore prefix method, you're guaranteed to hit a private method which will be trickier to catch (I realize I might be missing something here).

Suffixes would seem to solve this problem as Dave DeLong mentioned above. Although using suffixes when a method takes an argument (or multiple arguments in particular) looks very ugly in Objective-C to my eyes.

Karan at 2010-08-27 19:57:52:

I would also recommend adding a nil check for the result from [super copyWithZone:...] since you're doing a C-style pointer dereference on the result and there's no guarantee the result won't be nil, thus causing a crash.

Mike Shields at 2010-08-28 04:07:46:

One other thing I'd add to this list is being dang sure that you think about implementing -(void)hash whenever you override -(BOOL)isEqual:. I've been bitten enough times when I implement the second without the first and then stuff an object into a container like NSSet, only to have things not work correctly.

In fact, I'd say that you should ALWAYS implement -hash, if only to make sure that you know you've thought of the implications.

Steven Degutis at 2010-08-28 12:24:23:

Scott Gould: I think what Mike means is that Apple can (and does) add private methods without using the underscore prefix. Thus they are still just as tricky to catch as ones with underscores.

Anyone: Related: I've always wondered if it would be fine to use two or maybe even three underscores as a method prefix rather than initials or something. Granted this would only work if by convention it was only used in application code and never framework code, so that nobody had the monopoly on multi-underscores thus disabling anyone else from using them when adding a framework to their app.

Dave DeLong: Oh that's even uglier (IMHO). It comes right before the argument! And what happens when your method has multiple arguments?

Karan: Do you mean checking if an ivar of the newly copied object is nil, or checking if the newly copied object itself is nil? If the latter, I agree that it would be safer to do that, but I can't think of a single time where -copyWithZone: has failed or would ever fail or was documented to fail, nor have I ever seen an implementation where it would fail and return nil. But yes, technically it should be safer.

Patrick Stein: As a style, I personally find that one too repetitive to use, because you return several times when you really only need to return twice (self at the end of the method, or nil after an error).

But on the note of different coding styles for -init, if only you could use "return self" or "break" as the last clause of a for() loop, you could do something like this:

- (id) init {

    for (self = [super init]; self; break) {

        /* ... */

        if (something_goes_wrong) {

            [self release], self = nil;

            break;

        }

        /* ... */

    }

    return self;

}

Steven Degutis at 2010-08-28 12:32:03:

Mike Shields: Even better would be if Apple implemented Mike Ash's suggestion that Apple implements -hash on NSObject returning 0 at the very least...

Granted it would be more likely if Mike Ash submitted this as an actual bug report to Apple (I would but it's just too early in the morning).

Jim Rodovich at 2010-08-28 14:48:14:

Regarding memory management in setters, obj can be freed even if it's unequal to the released ivar, if the ivar holds the only reference to obj and releases it during dealloc. (For example, suppose you're implementing a linked list.) The second pattern you propose is safer: always retain before releasing.

With both patterns, the ivar is temporarily assigned to a value which could have been freed, so I'm not sure that either is thread-safe. (I may be wrong -- I'll admit I was releasing first before I saw this post.) Wouldn't the following be safer?

NSObject * temp = myObjIvar;

myObjIvar = [obj retain];

[temp release];

Or, more elegantly,

[myObjIvar autorelease];

myObjIvar = [obj retain];

Steven Degutis at 2010-08-28 15:57:53:

Jim Rodovich: I've always done the latter, using autorelease. However, using autorelease technically can make programs harder to debug since if there is a crash during the actual release, your stack trace with the autorelease will be nowhere to be found.

mikeash at 2010-08-28 17:45:16:

Jim Rodovich: Thread safety is a binary property: either something is thread safe, or it's not. None of the setters I showed, nor either of the ones you show, are thread safe. Yours narrow the window for the race condition, but there is still the possibility for a second thread to grab a reference to the object, and then for the first thread to destroy it before the second thread can retain it. When it comes to threaded programming, narrowing the window for a race condition is usually a bad idea, because it makes problems much harder to track down and debug. Better to crash early and often if the code is busted.

It is possible to do a lockless thread-safe setter but it is an enormous amount of work, something like 100-200 lines of code last time I tried it. Basically, if you need thread-safe setters (and that's usually the wrong place for thread safety, but not always) then use a lock.

Jim Rodovich at 2010-08-28 19:41:44:

Steven Degutis: Good point that I hadn't considered. The lack of a relevant stack is probably on its own a good enough reason not to use autorelease as I proposed, unless you're specifically concerned about waiting too long for release to return or about the stack exploding due to an extremely long release-dealloc-set:nil chain reaction. (I still have linked lists on my mind.)

Sean McB at 2010-09-04 02:16:06:

Patrick Stein -> when something fails in init, you should in fact call [super dealloc].

"[self dealloc] or [self release] are bad because they might call some subclass's -dealloc method even though the subclass's -init hasn't done anything yet."[1] Remember method dispatch is always dynamic in Obj-C.

[1] http://lists.apple.com/archives/objc-language/2008/Sep/msg00133.html

mikeash at 2010-09-04 02:52:16:

Sean McB: While the idea that you should call [super dealloc] directly has been in vogue lately, it's contradicted by the documentation:

Subclass implementations of this method should initialize and return the new object. If it can’t be initialized, they should release the object and return nil.

(From the docs for -[NSObject init].)

I mention that this is why you should always write your dealloc implementation to tolerate an uninitialized object. I would go so far as to say that not doing so would be a bug in the subclass.

Sean McB at 2010-09-12 01:10:06:

Mike: well, it wouldn't be the first time Apple's docs were wrong. :)

Personally, I trust Greg Parker's advice (in the link I posted above) more than the docs. Of course, it doesn't contradict your advice to 'always write your dealloc implementation to tolerate an uninitialized object', one should do that _also_.

Anyway, this is another nice thing about GC... no release/dealloc anyway. :)

Steve Wart at 2010-09-12 13:36:18:

There is an alternative solution to the NSCopyObject() problem that doesn't involve tricks with C pointers. The Smalltalk idiom for Object copy is to implement it as follows:



copy

   ^self shallowCopy postCopy

shallowCopy is a primitive that behaves essentially the same way as NSCopyObject(). Apple's documentation states that it will be deprecated at some point, but I think the alternatives are worse.

in Objective-C syntax the most abstract class might contain something like:



-(id)copyWithZone:(NSZone *)zone {

    // shallow copy

    id copy = NSCopyObject(self, 0, nil);

    [copy postCopy];

    return copy;

}

As a rule, subclasses should not overrride copy, but they could implement postCopy to maintain reference counting semantics:



-(void)postCopy {

    [super postCopy];

    [someString retain];

}

DavidPhillipOster at 2010-10-24 04:26:08:

Setters:

typo:
s/The superclass my call the setter/The superclass may call the setter/

Copy:

copyWithZone: shouldn't call alloc, as you've written, it should call allocWithZone: That's what allocWithZone: is for.

mikeash at 2010-10-24 20:21:35:

Thanks for the typo notice, I've fixed it now.

Regarding copyWithZone:, while you need to support the zone APIs when overriding in case someone decides to call them directly, I don't think it's important to actually respect the zone parameter. Zones are pretty much unused and useless these days. Certainly it would do no harm to use allocWithZone: in your implementation, but I don't think it's worth any bother.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks to Pygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting: `<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.