mikeash.com: just this guy, you know?

Posted at 2014-05-09 13:59 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2014-05-23: A Heartbleed-Inspired Paranoid Memory Allocator
Previous article: Friday Q&A 2014-03-14: Introduction to the Sockets API
Tags: arc assembly fridayqna memory
Friday Q&A 2014-05-09: When an Autorelease Isn't
by Mike Ash  

Welcome back to another Friday Q&A. I apologize for the unannounced hiatus in posts. It's not due to anything interesting, just a shortage of time. Friday Q&A will continue, and I will continue to aim for my regular biweekly postings. For today's article, I have a little story about an autorelease call that didn't do what it was supposed to do.

The Setup
ARC is a lovely technology but it doesn't cover everything. Sometimes you need to use CoreFoundation objects and you're back in the world of manual memory management.

Normally, that's no problem. I did manual memory management for many years, and while I enjoy not doing it with ARC, I still remember how. However, ARC makes some things a bit more difficult than they used to be. In particular, sometimes you want to autorelease a CoreFoundation object. Without ARC, you might write something like this:

    CFDictionaryRef MakeDictionary(void) {
        CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
        // Put some stuff in the dictionary here perhaps

        [(id)dict autorelease];
        return dict;
    }

This gives you nice memory management semantics, where the caller is not responsible for releasing the return value, just like we're used to with most Cocoa methods. It takes advantage of the fact that all CoreFoundation objects are also Objective-C objects, and an autorelease is a way to balance a CoreFoundation Create call.

This code no longer works with ARC, because the call to autorelease is not permitted. To solve this, Apple helpfully provided us with a CFAutorelease function which does the same thing and can be used with ARC. Unfortunately, it's only available as of iOS 7 and Mac OS X 10.9. For those of us who need to support older OS releases, we have to improvise.

My solution was to get the selector for autorelease using the sel_getUid runtime call, which sneaks past ARC's rules. Then I'd send that selector to the CoreFoundation object, thus accomplishing the same thing as [(id)dict autorelease]. My code looked like this:

    CFDictionaryRef MakeDictionary(void) {
        CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
        // Put some stuff in the dictionary here perhaps

        SEL autorelease = sel_getUid("autorelease");
        IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease);
        ((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);

        return dict;
    }

Note: if you ended up here because you need code to accomplish this, do not use this code. As we will see shortly, it's broken. If you want working code for this, check the end of the article.

I tested this code and everything worked fine. A little later, another programmer on the project reported that it was consistently crashing for him. Fortunately, I was able to replicate his crash without too much difficulty. However, it took a while to figure out just what was going on.

The Crash
This code does not crash itself. However, it can cause a crash in subsequent code. For example:

    CFDictionaryRef dict = MakeDictionary();
    NSLog(@"Testing.");
    NSLog(@"%@", dict);

This crashes on the second NSLog line. The stack trace looks like a typical memory management crash:

    frame #0: 0x00007fff917980a3 libobjc.A.dylib`objc_msgSend + 35
    frame #1: 0x00007fff97175184 Foundation`_NSDescriptionWithLocaleFunc + 41
    frame #2: 0x00007fff9077bd94 CoreFoundation`__CFStringAppendFormatCore + 7332
    frame #3: 0x00007fff907aa313 CoreFoundation`_CFStringCreateWithFormatAndArgumentsAux + 115
    frame #4: 0x00007fff907e1b9b CoreFoundation`_CFLogvEx + 123
    frame #5: 0x00007fff9719ed0c Foundation`NSLogv + 79
    frame #6: 0x00007fff9719ec98 Foundation`NSLog + 148

It seems that the dictionary is being destroyed before the NSLog call. But how can that be? We called autorelease in the function, and the autorelease pool has not yet been drained. The release that will balance the CoreFoundation Create call hasn't happened yet, so the object should still exist.

The Assembly
After poking at the code in various ways, I decided to read the assembly code generated by the compiler. There wasn't much to the code I wrote, so whatever problem there was must have been deeper.

Here's the x86-64 assembly output for the broken MakeDictionary function:

    _MakeDictionary:                        ## @MakeDictionary
        .cfi_startproc
    Lfunc_begin0:
        .loc    1 11 0                  ## test.m:11:0
    ## BB#0:
        pushq   %rbp
    Ltmp2:
        .cfi_def_cfa_offset 16
    Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
    Ltmp4:
        .cfi_def_cfa_register %rbp
        subq    $32, %rsp
        movabsq $0, %rax
        .loc    1 12 0 prologue_end     ## test.m:12:0
    Ltmp5:
        movq    %rax, %rdi
        movq    %rax, %rsi
        movq    %rax, %rdx
        movq    %rax, %rcx
        callq   _CFDictionaryCreateMutable
        leaq    L_.str(%rip), %rdi
        movq    %rax, -8(%rbp)
        .loc    1 15 0                  ## test.m:15:0
        callq   _sel_getUid
        movq    %rax, -16(%rbp)
        .loc    1 16 0                  ## test.m:16:0
        movq    -8(%rbp), %rax
        movq    %rax, %rdi
        callq   _object_getClass
        movq    -16(%rbp), %rsi
        movq    %rax, %rdi
        callq   _class_getMethodImplementation
        movq    %rax, -24(%rbp)
        .loc    1 17 0                  ## test.m:17:0
        movq    -24(%rbp), %rax
        movq    -8(%rbp), %rcx
        movq    -16(%rbp), %rsi
        movq    %rcx, %rdi
        callq   *%rax
        movq    %rax, %rdi
        callq   _objc_retainAutoreleasedReturnValue
        movq    %rax, %rdi
        callq   _objc_release
        .loc    1 19 0                  ## test.m:19:0
        movq    -8(%rbp), %rax
        addq    $32, %rsp
        popq    %rbp
        ret

Pretty straightforward here. Since no real calculations are done, we can just look at the sequence of callq instructions to see what functions are called. It calls CFDictionaryCreateMutable, sel_getUid, object_getClass, class_getMethodImplementation, and then there's an indirect call through the function pointer which is where it actually makes the autorelease call. ARC then hops in and does some pointless but harmless work on the return value from the call by retaining it and then immediately releasing it. The function then returns the dictionary to the caller.

Mostly Harmless
It took me a little while to realize what was going on, but then it was obvious. I said that the ARC calls inserted are "pointless but harmless." In fact, they are anything but!

One of the interesting features that came with ARC is fast handling of autoreleased return values. This sort of pattern is extremely common with ARC:

    // callee
    obj = [[SomeClass alloc] init];
    [obj setup];
    return [obj autorelease];

    // caller
    obj = [[self method] retain];
    [obj doStuff];
    [obj release];

A human programmer would typically omit the retain and release calls in the caller, but ARC is more paranoid. This would make things a bit slower when using ARC, which is where the fast autorelease handling comes in.

There is some extremely fancy and mind-bending code in the Objective-C runtime's implementation of autorelease. Before actually sending an autorelease message, it first inspects the caller's code. If it sees that the caller is going to immediately call objc_retainAutoreleasedReturnValue, it completely skips the message send. It doesn't actually do an autorelease at all. Instead, it just stashes the object in a known location, which signals that it hasn't sent autorelease at all.

objc_retainAutoreleasedReturnValue cooperates in this scheme. Before calling retain, it first checks that known location. If it contains the right object, it skips the retain. The net result is that the above code is effectively transformed into this:

    // callee
    obj = [[SomeClass alloc] init];
    [obj setup];
    return obj;

    // caller
    obj = [self method];
    [obj doStuff];
    [obj release];

This is faster because it skips the autorelease pool entirely, saving three message sends and the accompanying work: autorelease, the caller's retain, and the eventual release sent by the autorelease pool. It also allows the object to be destroyed earlier, reducing memory and cache pressure.

The beautiful thing about this technique is that because the runtime checks the caller's code before making this optimization, everything is perfectly compatible with code that doesn't participate in the scheme. If the caller does something else with the return value, then the runtime simply calls autorelease and everything works normally.

I said that this code is not pointless. What, then, is the point of the retain immediately followed by release in the assembly above? It allows the caller to participate in this scheme even though it's not using the return value. It would be correct to simply omit them, but in that case, the fast autorelease path is lost. It ends up being faster to make these two extra calls, at least in the common case.

I also said that this code is not harmless. The harm here is exactly that fast autorelease path. To ARC, an autorelease in a function or method followed by a retain in the caller is just a way to pass ownership around. However, that's not what's going on in this code. This code is attempting to actually put the object into the autorelease pool no matter what. ARC's clever optimization ends up bypassing that attempt and as a result, the dictionary is immediately destroyed instead of being placed in the autorelease pool for later destruction.

Root Cause
It all comes down to the function pointer cast used when making the autorelease call:

    ((id (*)(CFTypeRef, SEL))imp)(dict, autorelease);

I wrote it like this because that's what the type is. The autorelease method returns id and takes two (normally implicit) parameters: self and the selector being sent. I changed the self parameter to CFTypeRef instead of id for convenience, but left the return type as id since that's what it really is in the underlying autorelease method. It shouldn't matter, since the return value is ignored anyway.

That return type is this code's downfall. I was careful to avoid ARC's meddling for the most part, but that id makes ARC come in and start inserting calls, and that causes the dictionary to be immediately destroyed.

The Fix
Once all of this is known, the fix is easy. Get ARC out of the picture by having the call return CFTypeRef instead of id. Here's the complete function with the fix:

    CFDictionaryRef MakeDictionary(void) {
        CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
        // Put some stuff in the dictionary here perhaps

        SEL autorelease = sel_getUid("autorelease");
        IMP imp = class_getMethodImplementation(object_getClass((__bridge id)dict), autorelease);
        ((CFTypeRef (*)(CFTypeRef, SEL))imp)(dict, autorelease);

        return dict;
    }

Dumping the assembly shows that ARC is now out of the picture:

    _MakeDictionary:                        ## @MakeDictionary
        .cfi_startproc
    Lfunc_begin0:
        .loc    1 11 0                  ## test.m:11:0
    ## BB#0:
        pushq   %rbp
    Ltmp2:
        .cfi_def_cfa_offset 16
    Ltmp3:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
    Ltmp4:
        .cfi_def_cfa_register %rbp
        subq    $32, %rsp
        movabsq $0, %rax
        .loc    1 12 0 prologue_end     ## test.m:12:0
    Ltmp5:
        movq    %rax, %rdi
        movq    %rax, %rsi
        movq    %rax, %rdx
        movq    %rax, %rcx
        callq   _CFDictionaryCreateMutable
        leaq    L_.str(%rip), %rdi
        movq    %rax, -8(%rbp)
        .loc    1 15 0                  ## test.m:15:0
        callq   _sel_getUid
        movq    %rax, -16(%rbp)
        .loc    1 16 0                  ## test.m:16:0
        movq    -8(%rbp), %rax
        movq    %rax, %rdi
        callq   _object_getClass
        movq    -16(%rbp), %rsi
        movq    %rax, %rdi
        callq   _class_getMethodImplementation
        movq    %rax, -24(%rbp)
        .loc    1 17 0                  ## test.m:17:0
        movq    -24(%rbp), %rax
        movq    -8(%rbp), %rcx
        movq    -16(%rbp), %rsi
        movq    %rcx, %rdi
        callq   *%rax
        .loc    1 19 0                  ## test.m:19:0
        movq    -8(%rbp), %rcx
        movq    %rax, -32(%rbp)         ## 8-byte Spill
        movq    %rcx, %rax
        addq    $32, %rsp
        popq    %rbp
        ret

Architectures
One question remains: why did this code work for me initially, and my colleage only uncovered the crash later?

The answer is actually pretty simple, once everything else is known. This is an iOS project. I tested the code in the simulator, while he tried it on a real iPhone. The runtime function that performs the fast autorelease check is called callerAcceptsFastAutorelease. It's architecture-specific since it's inspecting machine code. If you look at the version used in the 32-bit iOS simulator, the problem becomes apparent:

    # elif __i386__  &&  TARGET_IPHONE_SIMULATOR

    static bool callerAcceptsFastAutorelease(const void *ra)
    {
        return false;
    }

In short, the fast autorelease handling is not implemented for the 32-bit iOS simulator. It makes sense that it wouldn't be. It's going to be some non-trivial amount of effort to implement and fix. Meanwhile, ARC is not supported on i386 for Mac programs, so the only way to hit this path on i386 is to run in the simulator. There's no real point in putting effort into extreme optimizations that will only apply to simulator apps.

Aside
Before writing this article, I first wrote a small test case so I could easily experiment and examine the problem in isolation. However, there was a big problem: the test case didn't work! Or rather, it did work just fine, and refused to crash. The code was really simple, roughly:

    int main(int argc, char **argv)
    {
        @autoreleasepool {
            CFDictionaryRef dict = MakeDictionary();
            NSLog(@"Testing.");
            NSLog(@"%@", dict);
        } 
        return 0;
    }

There isn't much room for error there, so it was baffling why it wouldn't crash.

After many single-steps through assembly in the debugger, I realized that it had to do with dyld lazy binding. References to external functions aren't fully bound when a program is initially loaded. Instead, a stub is generated which has enough information to complete the binding the first time the call is made. On the first call to an external function, the address for that function is looked up, the stub is rewritten to point to it, and then the function call is made. Subsequent calls go directly to the function. By binding lazily, program startup time is improved and time isn't wasted looking up functions that are never called.

That means that on the very first run of this code, the call to objc_retainAutoreleasedReturnValue isn't fully bound. Because it's not fully bound, callerAcceptsFastAutorelease doesn't realize that the call is to objc_retainAutoreleasedReturnValue. Because it doesn't see the call to objc_retainAutoreleasedReturnValue, the fast autorelease path isn't used. The dictionary goes into the autorelease pool as was originally intended, and the code works... once.

Once I figured that out, it was trivial to force the crash by inserting a loop:

    int main(int argc, char **argv)
    {
        while(1) @autoreleasepool {
            CFDictionaryRef dict = MakeDictionary();
            NSLog(@"Testing.");
            NSLog(@"%@", dict);
        } 
        return 0;
    }

The loop reliably crashes on the second iteration. The first time through triggers lazy binding of objc_retainAutoreleasedReturnValue, which then allows the next call to take the fast autorelease path and trigger the bug.

This has little consequence for normal programs, which will perform the lazy binding for functions like these early on. It ended up being a severe complicating factor for a small test program, though.

Conclusion
ARC is great technology, but sometimes it's necessary to work around it. When working around it, you have to be sure you really work around it, and not give it any opportunity to jump in. If you do, it might decide to eliminate what looks like a useless autorelease call, causing your objects to be instantaneously destroyed instead of being peacefully returned to the caller.

People sometimes ask me if I actually use the crazy and esoteric stuff I discuss on this blog. This is a good example: it took basic assembly language reading, Objective-C runtime internals, and understanding of specific ARC calls to track down this bug. Building the example crasher further required understanding how dyld binds external function references at runtime. This is all great stuff to know, and even if you never use it, it's just plain fun.

That's it for today. I hope to be back on track, so check back soon for another article. In the meantime, as always, Friday Q&A is driven by reader suggestions, so if you have a topic that you'd like to see covered, please send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Wow,

This one blew my mind several times. Great piece of investigatory work and a great example of using these "esoteric" examples that you provide us!

Bravo
I wonder if a more robust approach would be to implement your own "CFAutorelease", so that your code will be easily adaptable to use it when you can support iOS 7 and 10.9?

Since you can mix and match ARC with non-ARC, wouldn't it be as simply as compiling and linking a non-ARC object file built from.:

CFTypeRef MAAutorelease(CFTypeRef CF_RELEASES_ARGUMENT arg)
{
    return [(id)arg autorelease];
}

Then you could even use compile-time checks to opt for Apple's official CFAutorelease whenever it's available.
It seems like it should also be possible to store the CF object (after a __bridge cast) into an __autoreleasing variable, since ARC isn't allowed to elide those, AFAIK.
The documentation at http://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-retainautoreleasedreturnvalue hints at this:

If value is null, this call has no effect. Otherwise, it attempts to accept a hand off of a retain count from a call to objc_autoreleaseReturnValue on value in a recently-called function or something it calls. If that fails, it performs a retain operation exactly like objc_retain.


And objc_autoreleaseReturnValue from the same document:

If value is null, this call has no effect. Otherwise, it makes a best effort to hand off ownership of a retain count on the object to a call to objc_retainAutoreleasedReturnValue for the same object in an enclosing call frame. If this is not possible, the object is autoreleased as above.
I'm a big fan of the blog, but I have to question the sanity of actually using code like this. The things you explore in this article are implementation details in the truest sense, and then you come up with a solution that relies on the implementation details you discover. What if the implementation details change?

The suggested implementation is also ugly to look at and not immediately understandable. The two alternatives suggested in the comments (using __autoreleasing or exporting a non-ARC custom CFAutorelease function) are much cleaner.

I feel like #import <objc/runtime.h> is a worst-case scenario that should only be invoked if there is no other sensible way to do what you need.
I'm wondering why you didn't just use:

return CFBridgingRelease(dict);

As far as I'm aware, that hands off management of the object to ARC, which allows ARC to do its automatic autoreleaseReturnValue() and retainAutoreleasedReturnValue() as normal.

It doesn't help that there's no documentation for CFAutorelease() though, so I probably just don't properly understand the problem you're trying to solve.
I would just return an "NSDictionary *" and do

return (__bridge_transfer NSDictionary *)dict;
To force autoreleasing in ARC, wouldn't __autoreleasing work? It prevents the ARC optimization, and hurts the eyes a lot less:

CFDictionaryRef MakeDictionary(void) {
    CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
    // Put some stuff in the dictionary here perhaps

    id __autoreleasing result = CFBridgingRelease(dict);
    return (__bridge CFDictionaryRef)result;
}

You could then extract the autorelease part in a helper function:

inline CFTypeRef MyAutorelease(CFTypeRef obj) {
    id __autoreleasing result = CFBridgingRelease(obj);
    return (__bridge CFTypeRef)result;
}

CFDictionaryRef MakeDictionary(void) {
    CFMutableDictionaryRef dict = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
    // Put some stuff in the dictionary here perhaps

    return MyAutorelease(dict);
}

__attribute__((objc_precise_lifetime)) might also affect this behavior.
See http://clang.llvm.org/docs/AutomaticReferenceCounting.html

In any case ARC should offer enough tools to achieve autoreleasing without having to circumvent ARC by calling a selector manually.
Maybe __autoreleasing in combination with the precise lifetime attribute?
Daniel Jalkut: A non-ARC file containing a straightforward autorelease call would certainly work. I don't like using per-file compiler flags when I don't have to, so that's why I went with my approach, but it's just a matter of preference.

Ari Weinstein: The bug was due to implementation details, yes. But the corrected code doesn't depend on them at all. If the implementation details change, the correct code remains correct. The original bug could have been nastier of it had managed not to show up at all until, say, the next OS release, but fortunately it showed up quickly. I don't believe any such things remain in the revised code. As for other ideas being cleaner, I don't think __autoreleasing is cleaner or more understandable at all. You're ultimately relying on a side effect of assigning to a variable to accomplish the goal, with the actual call you're attempting to make being invisibly generated by the compiler, as opposed to having everything be completely explicit in the code. As for a non-ARC function, see above, I think that's just a matter of preference.

Jim Dovey: return CFBridgingRelease(dict); won't work because I'm returning a CFDictionaryRef. Written like that, it won't compile. Add a __bridge cast and it will compile, but the same problem I experienced will occur, as ARC will immediately release the object once it leaves that world.

bob: That would work, but the whole point of this function was to make it more convenient to call certain Security.framework APIs that take CFDictionaryRef parameters. As such, I wanted the return type to be exactly what it was expecting and not require additional casting at the call site.

Tammo Freese: Yes, I believe __autoreleasing would work. Clang's ARC docs seem clear enough that it's always going to be a real autorelease there. However, I really dislike how implicit it is, as I mentioned above.
Hi Mike, thanks for your answer. I'd like to understand where you draw the boundary between making memory management calls yourself, and relying on ARC to work correctly.

From my point of view, using ARC's __autoreleasing is explicit, just as using __weak or the default __strong: I express how I expect the variable to be handled, and leave retain, release and autorelease calls to the compiler/runtime.

As for "ultimately relying on a side effect of assigning to a variable to accomplish the goal", and the call(s) "being invisibly generated by the compiler, as opposed to having everything be completely explicit in the code": Isn't that true for all of ARC?

ARC is generally about storage semantics, not calls. When you declare a __strong variable, that doesn't mean "every time you assign to this, release what was there before and retain the new value". Instead, it tells the compiler to maintain a strong reference to the new value.

The difference between those two is apparent in the autorelease elision example I gave above. When you do obj = [self method] it's possible that neither a release nor a retain occurs. The compiler will omit the release of the previous value if it knows the previous value is nil, as is often the case. The runtime omits the retain when it can perform autorelease elision.

For a more explicit but less realistic example, imagine you wanted to cause three retains and releases for some reason. So you write code like this:

    id a = [obj method];
    id b = a;
    id c = b;


This works fine and does what you want... when you compile with -O0. But the moment you switch to -O3, the compiler realizes that none of these variables are actually being used, and it collapses the whole thing down to a single objc_retainAutoreleasedReturnValue (which generally won't perform a retain at all) followed by a single release. If you somehow needed three retains and releases here, your code no longer works.

Now, __autoreleasing is an exception to this, as far as I can tell. It's basically just a hack added to the system in order to allow ARC to work with the pre-ARC semantics of out parameters like the standard NSError ** parameter. As such, it looks like you can indeed count on a real autorelease being performed any time you assign to it. But I don't want to take advantage of that exception if I don't have to, and so for the same reason I'd never use __strong just for the side effect of generating a retain, I don't want to use __autorelease just for the side effect of generating an autorelease.

I hope that makes some sense
The Clang 3.5 OBJECTIVE-C AUTOMATIC REFERENCE COUNTING (ARC) document in section 7.1.1 states

"A program is ill-formed if it contains a method definition, message send, or @selector expression for any of the following selectors:
autorelease
release
retain
retainCount"

Why do feel that risk in creating ill-formed code is less of an issue than using __autoreleasing?


That is discussing the text of the program. It's saying you're no longer allowed to write any of these constructs:

- (id)autorelease { ... }
[obj autorelease];
@selector(autorelease)

And the equivalents for the other three messages mentioned. There is no "risk". A program is well-formed or it is not. In this case, it is, because it doesn't contain any of those constructs. If it was not, the compiler would output an error. This is why I had to write sel_getUid("autorelease") instead of @selector(autorelease).
You're calling the method indirectly, instead of directly calling the method. That doesn't make the code more well formed, it means that the ARC documentation doesn't bother to call out stick, stones, and assembly language. :-/
Hi Mike, thanks for your clarification. I understand why __autoreleasing feels a bit like a hack to you. I would use it as it does not feel like a hack to me, but simply as one of the storage specifiers.

You may be interested to know that __autoreleasing does not always perform a real autorelease when you assign to it. As an example, suppose you would like your MakeDictionary cast an NSMutableDictionary. This solution may or may not work:

CFDictionaryRef MakeDictionary(void) {
    id dict = [NSMutableDictionary dictionary];
    // Put some stuff in the dictionary here perhaps

    return (__bridge CFDictionaryRef)dict;
}

If +dictionary supports ARC's performance optimization with the handover, MakeDictionary would return a pointer to a deallocated object (in the highest optimization setting, the compiler inserts objc_retainAutoreleasedReturnValue and objc_release).

To be sure, I would use __autoreleasing:

CFDictionaryRef MakeDictionary(void) {
    id __autoreleasing dict = [NSMutableDictionary dictionary];
    // Put some stuff in the dictionary here perhaps

    return (__bridge CFDictionaryRef)dict;
}

Here, the compiler does not insert any retain/release/autorelease calls, but simply returns the object from the +dictionary call to the caller. The object is already retained and autoreleased, and even if +dictionary would support ARC's performance optimization, it won't kick in, as there is no call to objc_retainAutoreleasedReturnValue.
@MikeAsh, How about an article where you describe the debugging techniques used to track down this article's issue? I know many developers panic when they see EXC_BAD_access code and start guessing what's wrong. And perhaps a little bit more of those low level debugging functions which are useful (like register read).
Brad: If indirect calls count, then every non-trivial program is "Ill-formed", because ARC generates a ton of indirect calls to autorelease, not to mention about a million calls in the frameworks. Did the authors of that section intend to cover every non-trivial program? Of course not. It's obviously talking about syntax, as in you can't write [obj autorelaese], not about other ways to invoke it.

Tammo Freese: It's extremely interesting that assigning to an __autoreleasing doesn't always generate an autorelease call. To me, this perfectly illustrates why it's bad to rely on such an assignment for the autorelease side effect. If you can't rely on it actually calling autorelease, surely.

It's actually really easy to take advantage of that to make some code that crashes mysteriously:

NSMutableArray *PremadeDictionaries;

void FillPremadeDictionaries(void) {
    @autoreleasepool {
        PremadeDictionaries = [NSMutableArray array];
        
        for(int i = 0; i < 100; i++) {
            [PremadeDictionaries addObject: [NSMutableDictionary dictionary]];
        }
    }
}

CFDictionaryRef MakeDictionary(void) {
    id __autoreleasing dict = [PremadeDictionaries lastObject];
    [PremadeDictionaries removeLastObject];
    // Put some stuff in the dictionary here perhaps

    return (__bridge CFDictionaryRef)dict;
}

Assuming you call FillPremadeDictionaries first, this ends up returning a pointer to a destroyed object to the caller, and crashes after a few iterations of the testing loop.

Here's anotherMakeDictionary implementation that falls afoul of this. It stashes each dictionary in a global variable so it can release it next time:

id MakeCFDictionary(void) {
    static CFMutableDictionaryRef savedDictionary;
    
    if (savedDictionary != NULL)
        CFRelease(savedDictionary);
    
    savedDictionary = CFDictionaryCreateMutable(NULL, 0, NULL, NULL);
    return (__bridge id)savedDictionary;
}

CFMutableDictionaryRef MakeDictionary(void) {
    id __autoreleasing result = MakeCFDictionary();
    
    return (__bridge CFMutableDictionaryRef)result;
}

When called twice in in a row my test program, it ends up returning the same dictionary twice! The code superficially appears to work, but mutating one dictionary ends up mutating them both:

CFMutableDictionaryRef dict1 = MakeDictionary();
CFMutableDictionaryRef dict2 = MakeDictionary();
CFDictionaryAddValue(dict1, @"1", @"2");
NSLog(@"Testing.");
NSLog(@"%@ %@", dict1, dict2);

2014-05-12 11:23:30.597 a.out[95103:507] {
    1 = 2;
} {
    1 = 2;
}

Now, is this code realistic? Not particularly. It's kind of dumb. On the other hand, casting to a function pointer that returns id in code that's trying to bypass ARC is kind of dumb too. If __autoreleasing doesn't actually call autorelease in all cases, then how much analysis do you have to put into your code to be sure that it'll be called in the case you care about? How could that possibly be better than directly making the call yourself?

Dan: A lot of this one was just staring at the assembly and doing a lot of logging, which is why I kind of glossed over it. However, for a general discussion of techniques, this article might be more what you're looking for:

https://mikeash.com/pyblog/friday-qa-2013-06-28-anatomy-of-a-compiler-bug.html
I'm not sure which side I'd take in the __autoreleasing discussion, but I welcome the return of NSBlog with a very interesting investigation! Thanks

PS The hashcash took hours to calculate. It's only because I left this tab open in the background that I can, finally, comment.
Yes, I noticed some hashcash slowness as well. I'll try to get that straightened out. Not sure what changed, but I notice that Chrome is not taking up anywhere near 100% CPU when calculating it, so I wonder if I'm running into some sort of throttle on long-running JavaScript stuff. Thanks for reminding me, in any case.
Hi Mike, thanks for your examples! I totally forgot about objects from an NSDictionary are returned "as-is", and so the __autoreleasing optimization screws things up then. I think that qualifies as a compiler bug, I'll file a radar for it.

Kudos for the second example, haven't wrapped my head around it yet. I see what's going wrong in MakeDictionary, but haven't found out why that leads to the same dictionary returned twice.
I think you're right that this qualifies as a compiler bug.

The second example is kind of weird, and returning the same dictionary twice is mostly luck. The first dictionary gets destroyed and then the second dictionary just happens to be allocated in the same location. Depending on memory allocation patterns and such, it could easily be something else, or just invalid memory.
@Tammo: I'm not sure I want to call your example a true "bug". __bridge (as opposed to CFBridgingRetain) does not guarantee anything about lifetime—if it did, you'd be adding extra expense to cases where you're just using the CF object transiently. So you end up returning the casted dictionary, then the last strong reference to it disappears (because the return type is not an Objective-C object type).

I suppose the compiler could retain/autorelease when returning a bridged object, but honestly it's probably better to just accept CoreFoundation conventions, annotate the function as CF_RETURNS_RETAINED, and return the new dictionary at +1. Even though it's slightly more difficult for callers to work with.
I'm a bit puzzled by the implementation of callerAcceptsFastAutorelease. While it's almost straightforward for x86_64 (the code looks for the presence of the instruction "jmpq &objc_retainAutoreleasedReturnValue" the ARM implementation is quite obscure.
The only thing the function does is to check for a specific marker (a no-op: mov r7, r7) that is put there by the ARC optimiser of the LLVM compiler. Diving into LLVM source code brings up the following bit of knowledge:

"The implementation of objc_autoreleaseReturnValue sniffs the instruction stream following its return address to decide whether it's a call to objc_retainAutoreleasedReturnValue. This can be prohibitively expensive, depending on the relocation model, and so on some targets it instead sniffs for a particular instruction sequence. This functions returns that instruction sequence in inline assembly, which will be empty if none is required."


I was wondering why is that so on arm? Having the compiler put there a certain marker so that a specific implementation of a library can find it sounds like a strong coupling between compiler and library code. Why can't the "sniffing" be implemented the say way as on the x86_64 platform?
@Jordan: I haven't meant that the compiler misbehaves with the __bridge call. That one would simply return the pointer, which it does. I meant that the object has to end up in the autorelease pool when assigned to an __autoreleasing variable.
Fabio Gallonetto: Good question! I can only guess that the performance of chasing all those pointers through code on ARM wasn't as good as on x86-64, to the extent that it justified a special marker like that. As for why, I'm not sure. Both architectures look pretty similar in terms of how they make a cross-library call like this: they jump to a local stub which then jumps to an address stored at a PC-relative location, and the linker takes care of loading the address of the destination function at that location. The only thing I can come up with is that maybe chasing down the pointers is slower on ARM, perhaps because of distinct instruction/data caches. This is a complete guess, though, and I'd welcome any more concrete information.
Somebody put the question of the NOP marker on StackOverflow and got answers from the guys who would know. Link:

http://stackoverflow.com/questions/23764271/why-the-implementation-of-arcs-objc-autoreleasereturnvalue-differs-for-x86-64-a/

In short, the dyld stubs on ARM have multiple different forms due to the nature of the ARM instruction set, and checking for all the various possibilities would be really slow.
Great in-depth analysis. I thought ARC convers all of the hidden staff for the developer and seldom considered the really effect it takes.
This is a great post! I learned something much deeper on ARC and autorelease. I am an iOS app developer, my experience is "the more advanced code usually leads to more advanced debug techniques" :-)

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.