mikeash.com: just this guy, you know?

Posted at 2010-02-26 18:19 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2010-03-05: Compound Futures
Previous article: Friday Q&A 2010-02-19: Character Encodings
Tags: blocks fridayqna futures
Friday Q&A 2010-02-26: Futures
by Mike Ash  

Welcome back to another shiny edition of Friday Q&A. Guy English suggested taking a look at implementing futures in Objective-C using blocks, and for this week's post I'm going to talk about the futures implementation that I built.

Futures
A future is, in short, an object which hides a calculation. When a future is resolved, the resolution blocks until the calculation has finished. My code involves implicit futures: a future is a proxy for the calculated result and, when messaged, transparently resolves the future and then passes the message on to the result. To code which uses it, a future is essentially indistinguishable from the object that it represents.

In Objective-C, it's natural to represent the calculation using a block. Futures are created by simply calling a function which takes a block representing the calculation. They return a proxy object which captures messages sent to that object and resolves the future as needed. My code has two kinds of futures.

Background futures begin the calculation immediately on a background thread as soon as the future is created. When the future is resolved, if the calculation is not yet complete, the call blocks until it's done. If the calculation finishes first, then resolution completes immediately. Background futures provide a way to write parallel code without needing to worry about details of synchronization. For example, if you're going to pass an NSData to another object that won't actually use that NSData for a while, you can use a background future to allow other code to run concurrently with the disk access with very little effort:

    NSString *filename = ...;
    NSData *future = MABackgroundFuture(^{ return [NSData dataWithContentsOfFile: filename]; });
    [object doSomethingLaterWithData: future];
Lazy futures do not begin the calculation until the future is resolved. If the future is never resolved, then the calculation is never performed. Lazy futures make it possible to provide an object immediately to an API which may or may not actually make use of it, and not pay the cost of creating that object until and unless it's actually requested. For example, you could use a lazy future to defer the reading of a file until and unless it's needed:
    NSString *filename = ...;
    NSData *future = MALazyFuture(^{ return [NSData dataWithContentsOfFile: filename]; });
    [object doSomethingOrNotWithData: future];

Getting the Code
As usual, I'm just going to cover the highlights of the code here, but the library is available from my subversion repository:

    svn co http://mikeash.com/svn/MAFuture/
Or just click on the hyperlink above to browse it.

A Custom Proxy Class
When building object proxies, Cocoa provides a convenient root proxy class to subclass in the form of NSProxy. The idea is that NSProxy provides a minimal implementation, which allows almost all messages to be captured and proxied.

Unfortunately, NSProxy doesn't quite live up to its promise. It implements a whole bunch of unnecessary methods, including important ones like -hash and -isEqual:. This is a problem because I don't want NSProxy's implementations of these, I want the implementations in the proxied object. To make that happen, I'd have to either manually override these methods, or play runtime tricks to make them hit the forwarding path. Neither alternative is particularly appealing.

Instead, I chose a third route: implement my own proxy class. MAProxy is a true minimalistic proxy class. It implements memory management methods and -isProxy. It also implements +initialize, which the runtime requires to exist in every class. The header is really basic and the implementation nearly as simple. The only tricky stuff at work is the inline reference count using atomic functions for thread safety.

Future Basics
To make implementing futures easier, I created a base class called MABaseFuture which provides common facilities. A basic future needs to be able to store a value, to store whether the future has been resolved yet or not, and a condition variable to make it all thread safe:

    @interface MABaseFuture : MAProxy
    {
        id _value;
        NSCondition *_lock;
        BOOL _resolved;
    }
For code, the class obviously needs creation/destruction methods:
    - (id)init
    {
        _lock = [[NSCondition alloc] init];
        return self;
    }
    
    - (void)dealloc
    {
        [_value release];
        [_lock release];
        
        [super dealloc];
    }
Then, accessors for the future's value. There are both locked and unlocked setters because a subclass may want to set the future's value after already acquiring the lock:
    - (void)setFutureValue: (id)value
    {
        [_lock lock];
        [self setFutureValueUnlocked: value];
        [_lock unlock];
    }
    
    - (id)futureValue
    {
        // skip the usual retain/autorelease dance here
        // because the setter is never called more than
        // once, thus value lifetime is same as future
        // lifetime
        [_lock lock];
        id value = _value;
        [_lock unlock];
        return value;
    }
    
    - (void)setFutureValueUnlocked: (id)value
    {
        [value retain];
        [_value release];
        _value = value;
        _resolved = YES;
        [_lock broadcast];
    }
A quick getter to see if the future has been resolved (which relies on the subclass to manually acquire the lock first):
    - (BOOL)futureHasResolved
    {
        return _resolved;
    }
Then the one part that's a bit interesting, a method to wait for the future to resolve, handy for implementing background futures:
    - (id)waitForFutureResolution
    {
        [_lock lock];
        while(!_resolved)
            [_lock wait];
        [_lock unlock];
        return _value;
    }
This class also has a -resolveFuture method which is abstract. Subclasses must override it and do whatever they need to do:
    - (id)resolveFuture
    {
        NSLog(@"-[MABaseFuture resolveFuture] called, this should never happen! Did you forget to implement -[%@ resolveFuture]?", NSStringFromClass(isa));
        NSParameterAssert(0);
        return nil;
    }
Actually there are two interesting parts to this class, and the second one is here. It's an implementation of -class. Normally this implementation wouldn't be necessary, as the proxy mechanism will proxy that method just fine. The problem arises in the implementation of -[NSCFString isEqual:], part of the CFString toll-free bridging, and with other bridged classes. That code checks the class of the other object, and if it's an NSCFString as well, hits a fast path that depends on internal implementation details of NSCFString. If -class returns NSCFString when the object is really a proxy, that code fails and the two strings will never compare as equal, even when they are.

The fix is simple, if bizarre. Get the real class, check to see if it starts with NSCF, and return the superclass if it does. If the real class is NSCFString, this will return NSString, the code goes through the general equality path, and all is well. This is the implementation of -class:

    - (Class)class
    {
        Class c = [[self resolveFuture] class];
        if([NSStringFromClass(c) hasPrefix: @"NSCF"])
            return [c superclass];
        else
            return c;
    }
And that's all there is to MABaseFuture.

Deepening the Hierarchy
Building on MABaseFuture, I want to then create a tree of subclasses. First, _MASimpleFuture will contain some more common facilities for "simple" futures (futures which immediately resolve when accessed), then I'll create two subclasses of that for background and lazy futures.

_MASimpleFuture is very, well, simple. It implements -forwardingTargetForSelector: to resolve the future and return the object that resulted:

    - (id)forwardingTargetForSelector: (SEL)sel
    {
        LOG(@"%p forwardingTargetForSelector: %@, resolving future", self, NSStringFromSelector(sel));
        return [self resolveFuture];
    }
Using this class, subclasses just need to provide an initializer method and override -resolveFuture, and they get forwarding for free.

Forwarding to nil
There's a bad corner case here, which happens if the future returns nil. Messaging nil is no problem, but forwardingTargetForSelector: takes a nil return as meaning that there is no forwarding target, and the runtime should start on the slow forwarding path instead.

And here there's a major problem, because the slow forwarding path requires a method signature, but it's impossible to get one from nil. I've partially solved this in an extremely brute-force fashion by writing a class called MAMethodSignatureCache, which will check every class registered with the runtime for a selector and return whatever method signature it can dig up. (I didn't write it solely for this, there's more handy stuff to do with it in another post.) I can use this class to implement the slow forwarding path of _MASimpleFutureto return zero:

    - (NSMethodSignature *)methodSignatureForSelector: (SEL)sel
    {
        return [[MAMethodSignatureCache sharedCache] cachedMethodSignatureForSelector: sel];
    }
    
    - (void)forwardInvocation: (NSInvocation *)inv
    {
        // this gets hit if the future resolves to nil
        // zero-fill the return value
        char returnValue[[[inv methodSignature] methodReturnLength]];
        bzero(returnValue, sizeof(returnValue));
        [inv setReturnValue: returnValue];
    }
The problem comes when there are multiple method signatures for a given selector, which can easily happen if two unrelated classes implement methods with the same name. In that case, there's no way to know which one is meant, and this whole approach falls apart. Unfortunately, with the way the runtime is currently written, there's no generalized way to "forward to nil".

If that's not enough, there's another problem with futures that return nil. This problem is quite simple: although the futured value may be nil, the future object itself is not nil. Any code which checks the object pointer for nil before using it will fail in weird ways. Imagine this code using an NSData:

    NSData *data = ...;
    if(data)
        [self doSomethingWithBytes: [data bytes]];
If data is a future that resolves to nil, then the if check will pass, but [data bytes] will return NULL, causing a crash.

Because of these two problems, you should avoid futuring any computation which might return nil.

Background Futures
To implement background futures, I created a class called _MABackgroundBlockFuture. Since computation is supposed to begin immediately in the background, I create an initializer which takes the block to compute, and uses Grand Central Dispatch to execute it in the background. Once the computation is finished, it simply calls -setFutureValue: to set the computed value and mark the future as resolved:

    - (id)initWithBlock: (id (^)(void))block
    {
        if((self = [self init]))
        {
            dispatch_async(dispatch_get_global_queue(0, 0), ^{
                [self setFutureValue: block()];
            });
        }
        return self;
    }
The implementation of -resolveFuture is then extremely simple. Since the future is already being computed, it just waits for it to finish, then returns the result:
    - (id)resolveFuture
    {
        return [self waitForFutureResolution];
    }
Lazy Futures
I created _MALazyBlockFuture to implement lazy futures. A lazy future doesn't begin computation right away, so it just needs to store a copy of the block when initialized, and release it when deallocating:
    - (id)initWithBlock: (id (^)(void))block
    {
        if((self = [self init]))
        {
            _block = [block copy];
        }
        return self;
    }
    
    - (void)dealloc
    {
        [_block release];
        [super dealloc];
    }
Resolution is straightforward as well. Acquire the lock. If the future hasn't been resolved yet, then call the block and set the future's value from its result:
    - (id)resolveFuture
    {
        [_lock lock];
        if(![self futureHasResolved])
        {
            [self setFutureValueUnlocked: _block()];
            [_block release];
            _block = nil;
        }
        [_lock unlock];
        return _value;
    }
Wrappers
This code now has all the functionality that's needed, but I want a couple of wrappers to make it nicer to use:
    id MABackgroundFuture(id (^block)(void))
    {
        return [[[_MABackgroundBlockFuture alloc] initWithBlock: block] autorelease];
    }
    
    id MALazyFuture(id (^block)(void))
    {
        return [[[_MALazyBlockFuture alloc] initWithBlock: block] autorelease];
    }
Because these functions return id, the compiler won't be able to catch mistakes like:
    NSArray *array = MALazyFuture(^{ return [self somethingThatReturnsNSString]; });
Gcc will also reject this because the block types don't match exactly (returning NSString * instead of id) even though they're completely compatible.

I worked around both of these problems by using two really scary-looking macros:

    #define MABackgroundFuture(...) ((__typeof((__VA_ARGS__)()))MABackgroundFuture((id (^)(void))(__VA_ARGS__)))
    #define MALazyFuture(...) ((__typeof((__VA_ARGS__)()))MALazyFuture((id (^)(void))(__VA_ARGS__)))
Let's unpack these a bit.

First, they take variable arguments, because block syntax doesn't play completely nice with the preprocessor. If you write a block which contains a comma that isn't inside parentheses (which can be written completely legally), the preprocessor won't realize that it's inside a block, and will use that as an argument separator. The preprocessor will therefore see two (or more) arguments, and a single-argument macro will fail. By making the macros take ..., that problem is avoided. Thus, the block parameter is represented in the macro by __VA_ARGS__.

__typeof is a gcc language extension which pretty much does what it says. You give it an expression, and it gives you the type of the expression.

The argument to __typeof is (__VA_ARGS__)(). Remember that __VA_ARGS__ is the block being passed to the macro. This expression calls the block. But since it's an argument being passed to __typeof, which is a compile-time construct, it doesn't really call the block. Put the two together, and you get the return type of the block.

Next, the whole __typeof combination is wrapped in another set of parentheses and put right before the call through to the real function, which casts the return value of the function.

Finally, the function argument is cast to id (^)(void), because gcc is too stupid to understand that a block which returns NSString * is compatible with a block type that returns id.

Potential Uses
Background futures are useful any time you have computation which can happen asynchronously. In essence, you can think of the future as a synchronization mechanism, like a lock, which ensures that the job is completed before the result is used. Because these futures automatically forward requests, you can pass the future into code that doesn't know what it is, and it will be resolved automatically.

For example, let's say you're building a composite image by loading one image from disk, shrinking another image that already exists, and then putting the result into an image view:

    NSImage *image1 = [[NSImage alloc] initWithContentsOfFile: ...];
    NSImage *image2 = [self shrinkImage: existingImage];
    
    NSImage *composite = [[NSImage alloc] initWithSize: ...];
    [composite lockFocus];
    [image1 drawAtPoint: NSZeroPoint fromRect: NSZeroRect operation: NSCompositeSourceOver fraction: 1.0];
    [image2 drawAtPoint: NSZeroPoint fromRect: NSZeroRect operation: NSCompositeSourceOver fraction: 1.0];
    [composite unlockFocus];
    
    [imageView setImage: composite];
By futuring all of the images, you allow the work for image1 and image2 to run in parallel, and there's at least the possibility that some of the work for composite could run in parallel with main thread work too:
    NSImage *image1 = MABackgroundFuture(^{ return [[[NSImage alloc] initWithContentsOfFile: ...] autorelease]; });
    NSImage *image2 = MABackgroundFuture(^{ return [self shrinkImage: existingImage]; });
    
    NSImage *composite = MABackgroundFuture(^{
        NSImage *img = [[NSImage alloc] initWithSize: ...];
        [img lockFocus];
        [image1 drawAtPoint: NSZeroPoint fromRect: NSZeroRect operation: NSCompositeSourceOver fraction: 1.0];
        [image2 drawAtPoint: NSZeroPoint fromRect: NSZeroRect operation: NSCompositeSourceOver fraction: 1.0];
        [img unlockFocus];
        return [img autorelease];
    });
    
    [imageView setImage: composite];
Quick and easy parallel code, and imageView never has to know that it's getting a proxy instead of the real thing.

Lazy futures are useful any time you have objects that may never be needed, or simply may not be needed for a long time. Even if the object is used, deferring computation can spread out the load and improve responsiveness and startup times.

As an example, imagine some code which sets up a bunch of data file contents to be accessed through a dictionary:

    gGlobalDictionary = [[NSDictionary alloc] initWithObjectsAndKeys:
        [NSData dataWithContentsOfFile: ...], @"dataFile",
        [NSData dataWithContentsOfFile: ...], @"anotherDataFile",
        [NSData dataWithContentsOfFile: ...], @"moreDataFile",
        [NSData dataWithContentsOfFile: ...], @"fourthDataFile",
        nil];
You could load these files lazily by splitting them out, providing an accessor for each one, etc., but using lazy futures requires minimal changes and most of the same benefits:
    gGlobalDictionary = [[NSDictionary alloc] initWithObjectsAndKeys:
        MALazyFuture(^{ return [NSData dataWithContentsOfFile: ...] }), @"dataFile",
        MALazyFuture(^{ return [NSData dataWithContentsOfFile: ...] }), @"anotherDataFile",
        MALazyFuture(^{ return [NSData dataWithContentsOfFile: ...] }), @"moreDataFile",
        MALazyFuture(^{ return [NSData dataWithContentsOfFile: ...] }), @"fourthDataFile",
        nil];
Many other possibilities abound for parallel and lazy computation through the use of futures.

Conclusion
Futures are an interesting technique which can make it much easier to use lazy evaluation and parallel computation. Futures make it easy to use lazy evaluation even when you have no control over (or no desire to change) the code that will eventually use the value in question. They also make for a handy synchronization mechanism for performing heterogeneous parallel computations. The dynamic nature of Objective-C makes it possible to mostly hide the existence of the future from code that isn't involved in creating it.

That's it for this week's edition. Come back in a week for the next one. As always, Friday Q&A is driven by user suggestions, so if you have a topic that you'd like to see covered here, send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Thanks Mike,

As usual a very interesting article that I've already begun to think of how to use Futures in code I'm working on today!

-john
Hi, I would like to know why in your implementations of retain and release you never use Foundation NSIncrementExtraRefCount() and NSDecrementExtraRefCountWasZero() functions for reference counting ? Cocoa uses them and don't seem to have much problem with threading as far as I can see.

Also, for the problem of nil resolved values, wouldn't it be possible to use some kind of default value ? You add a parameter to your method that allows the implementor to provide a value (for example if an NSString is required you put a @"" as default value) so in case the resolved value is nil it will return that object instead.


On a side note, when you use an instance of NSString class itself (the abstract class, which is illegal), the abstract methods throw an exception like that if that interests you :D :
@throw [NSException exceptionWithName:NSInvalidArgumentException reason:[NSString stringWithFormat:@"*** -%s cannot be sent to an abstract object of class %@: Create a concrete instance!", sel_getName(_cmd), [self class]] userInfo:nil];
Thanks Mike, this is an awesome post: you designed a very powerful tool with, when you look at it, a very small amount of code.

In other posts with other interesting constructs you made, you have sometimes warned against not using them in "production code". What would be your advice with your futures implementation?

It seems the main concern is the 'nil' value. Remy's suggestion solves both issues of method signature and messaging to nil. But it can often only push the problem further away: the code that relies on checking for nil (which may not even be your code) may not crash or raise an exception, but will still be handling an unexpected result without realizing it, and it is likely to result in other problems down the road. Failing early might be a better option, and this can be done within MABaseFuture.
Remy: The main reason I didn't use those functions is simply because I didn't know about them. I still like the inline reference count because it's (presumably) faster, and won't have the spinlock contention problems of the Cocoa implementation, but it's really not important either way.

Regarding the default object, I don't see much point in making that part of the API, because it's so easy to just write your block to handle it:

    id foo = MALazyFuture(^{
        id obj = [self mayReturnNil];
        if(!obj)
            obj = placeholderValue;
        return obj;
    });

More useful, I think, would be extending the API to take a placeholder class that could be used to look up method signatures, e.g.:

    id foo = MALazyFuture(SomeClass, ^{ return [self mayReturnNil]; });

However, this still suffers from the problem of nil checks.

Charles: I always expect my readers to use their own judgement, of course, but if I don't make the warning then I generally consider it to be safe. The nil thing is an annoying problem, but only if you're writing code which might actually return nil. If you know it can't return nil (and most code has tons of places where an unexpected nil would be fatal anyway) then everything should be safe.

There is the potential for trouble if you pass futures to code that is written in such a way as to not deal properly with proxies (like the one my -class override hack works around) but I expect that sort of thing to be extremely rare, and could be considered a bug in Cocoa, depending on the exact circumstances.
What does NSProxy instances return when you send them a -class message ?
The documentation for -isProxy says this method is required because -isMemberOfClass: and -isKindOfClass: methods do type-check on their proxied object. I would suppose then that -class would return the class of the proxy itself.

So you could do the same thing in your code, supposing people always do actual type-check using -isMemberOfClass: and -isKindOfClass: and not [obj class] == [MyClass class].
-[NSProxy class] returns the proxy class.

Which behavior is correct is definitely debatable. Code which checks class and then starts accessing instance variables directly if it matches will fail hard if you proxy that message. On the other hand, code which checks class as a first check for equality will fail if you don't proxy it.

Ultimately I lean towards making the proxy look as much like the original object as possible, which means proxying class.
Okay, fair enough, though the -isProxy method is specifically here to tell whether you should trust -class as being the object you're talking or not. IMHO It's a little weird to only be able to know the class of an object through runtime functions, even for proxy object. But of course that's a choice to make, too.
There is one problem I see with this approach:
In order to maximize parallelism, the Future has to be created as early as possible and used as late as possible. That encourages developers to restructure code for performance instead of legibility. This could be done by the compiler, but obviously the compiler doesn't know about the internals.
It's a bit of a pity, because it's a very nice approach.
Are there any performance techniques which don't encourage developers to restructure code for performance? If your goal is faster code, there are going to be compromises, that's just how it is. If you prefer legibility over performance, that's perfectly reasonable, but that does mean that the performance techniques available to you will be extremely limited!
You can solve the block signature mismatch by making the return type explicit:

MALazyFuture((id) ^ () { return title; });

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.