mikeash.com: just this guy, you know?

Posted at 2012-06-22 14:17 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2012-07-06: Let's Build NSNumber
Previous article: Friday Q&A 2012-06-01: A Tour of PLWeakCompatibility: Part II
Tags: clang fridayqna objectivec
Friday Q&A 2012-06-22: Objective-C Literals
by Mike Ash  
This article is also available in Hindi (translation by Priti Agarwal) and Hungarian (translation by Szabolcs Csintalan).

Welcome back! After a brief hiatus for WWDC, it's time for another wacky adventure. Today's topic is the new object literals syntax being introduced into Objective-C, which was suggested by reader Frank McAuley.

Literals
For anyone unfamiliar with the term in the context of programming languages, a "literal" refers to any value which can be written out directly in source code. For example, 42 is a literal in C (and, of course, a lot of other languages). It's common to refer to what kind of value they produce, so 42 is an integer literal, "hello" is a string literal, and 'x' is a character literal.

Literals are a foundational building block of most programming languages, since there needs to be some way of writing constant values in code. They aren't strictly necessary, as you can construct any desired value at runtime, but they generally make code a lot nicer to write. For example, we can construct 42 without using any literals:

    int fortytwo(void)
    {
        static int zero; // statics are initialized to 0
        static int fortytwo;
        if(!fortytwo)
        {
            int one = ++zero;
            int two = one + one;
            int four = two * two;
            int eight = four * two;
            int thirtytwo = eight * four;
            fortytwo = thirtytwo + eight + two;
        }
        return fortytwo;
    }

However, if we had to do this for every integer we used, we'd probably all give up computer programming and go into some profession where the tools don't hate us so much. Likewise, we could construct C strings by hand out of characters, but strings are used so commonly that the language has a concise way to write them.

Collections are pretty commonly used as well. C originally had no facilities for collection literals, but the ability to initialize variables of a compound data type came pretty close:

    int array[] = { 1, 2, 3, 4, 5 };
    struct foo = { 99, "string" };

This isn't always entirely convenient, and so C99 added compound literals, which allow writing such things directly in code anywhere:

    DoWorkOnArray((int[]){ 1, 2, 3, 4, 5 });
    DoWorkOnStruct((struct foo){ 99, "string" });

Collection literals are pretty common in other languages too. For example, the popular JSON serialization format is just a codification of JavaScript's literal syntax. This JSON code is also valid syntax to create an array of dictionaries in JavaScript, Python, and probably some other languages:

    [{ "key": "obj" }, { "key": "obj2" }]

Until recently, Objective-C didn't have any syntax for Objective-C collections. The equivalent to the above was:

    [NSArray arrayWithObjects:
        [NSDictionary dictionaryWithObjectsAndKeys:
            @"obj", @"key", nil],
        [NSDictionary dictionaryWithObjectsAndKeys:
            @"obj2", @"key", nil],
        nil];

This is really verbose, to the extent that it's painful to type and obscures what's going on. The limitations of C variable argument passing also require a nil sentinel value at the end of each container creation call, which can fail in extremely odd ways when forgotten. All in all, not a good situation.

Container Literals
The latest clang now has support for container literals in Objective-C. The syntax is similar to that of JSON and modern scripting languages, but with the traditional Objective-C @ thrown in. Our example array/dictionary looks like this:

    @[@{ @"key" : @"obj" }, @{ @"key" : @"obj2" }]

There's definitely a bit of @ overload happening here, but it's a vast improvement over the previous state of things. The @[] syntax creates an array from the contents, which must all be objects. The @{} syntax creates a dictionary from the contents, which are written as key : value instead of the completely ludicrous value, key syntax found in the NSDictionary method.

Because it's built into the language, there's no need for a terminating nil. In fact, using nil anywhere in these literals will throw an error at runtime, since Cocoa collections refuse to contain nil. As always, use [NSNull null] to represent nil in collections.

There is no equivalent syntax for NSSet. The array literal syntax makes the job a bit nicer, since you can do something like [NSSet setWithArray: @[ contents ]], but there's nothing quite like the concise literal syntax.

Everything you put into such an array or dictionary still has to be an object. You can't fill out an object array with numbers by writing @[ 1, 2, 3 ]. However, this is made much easier by the introduction of....

Boxed Expressions
Boxed expressions essentially allow for literals corresponding to primitive types. The syntax is @(contents), which produces an object boxing the result of the expression within the parentheses.

The type of object depends on the type of the expression. Numeric types are converted to NSNumber objects. For example, @(3) produces an NSNumber containing 3, just like if you wrote [NSNumber numberWithInt: 3]. C strings are converted to NSString objects using the UTF-8 encoding, so @("stuff goes here") produces an NSString with those contents.

These can contain arbitrary expressions, not just constants, so they go beyond simple literals. For example, @(sqrt(2)) will produce an NSNumber containing the square root of 2. The expression @(getenv("FOO")) is equivalent to [NSString stringWithUTF8String: getenv("FOO")].

As a shortcut, number literals can be boxed without using the parentheses. Rather than @(3), you can just write @3. Applied to strings, this gives us the familiar and ancient construct @"object string". Note that expressions do not work like this. @2+2 and @sqrt(2) will produce an error, and must be parenthesized as @(2+2) and @(sqrt(2)).

Using this, we can easily create an object array containing numbers:

    @[ @1, @2, @3 ]

Once again, a bit of @ overload, but much nicer than the equivalent without the new syntax.

Note that boxed expressions only work for numeric types and char *, and don't work with other pointers or structures. You still have to resort to longhand to box up your NSRects or SELs.

Object Subscripting
But wait, there's more! There's now concise syntax for fetching and setting the elements of an array and dictionary. This isn't strictly related to object literals, but arrived in clang at the same time, and continues the theme of making it easier to work with containers.

The familiar [] syntax for array access now works for NSArray objects as well:

    int carray[] = { 12, 99, 42 };
    NSArray *nsarray = @[ @12, @99, @42 ];

    carray[1]; // 99
    nsarray[1]; // @99

It works for setting elements in mutable arrays as well:

    NSMutableArray *nsarray = [@[ @12, @99, @42 ] mutableCopy];
    nsarray[1] = @33; // now contains 12, 33, 42

Note, however, that it's not possible to add elements to an array this way, only replace existing elements. If the array index is beyond the end of the array, the array will not grow to match, and instead it throws an error.

It works the same for dictionaries, except the subscript is an object key instead of an index. Since dictionaries don't have any indexing restrictions, it also works for setting new entries:

    NSMutableDictionary *dict = [NSMutableDictionary dictionary];
    dict[@"suspect"] = @"Colonel Mustard";
    dict[@"weapon"] = @"Candlestick";
    dict[@"room"] = @"Library";

    dict[@"weapon"]; // Candlestick

As with literals, there is no equivalent notation for NSSet, probably because it doesn't make much sense to subscript sets.

Custom Subscripting Methods
In a really cool move, the clang developers made the object subscripting operators completely generic. They're not actually tied into NSArray or NSDictionary in any way. They simply translate to simple methods which any class can implement.

There are four methods in total: one setter and one getter for integer subscripts, and one setter/getter for object subscripts. The integer subscript getter has this prototype:

    - (id)objectAtIndexedSubscript: (NSUInteger)index;

You can then implement this to do whatever you want to support the semantics you want. The code simply gets translated mechanically:

    NSLog(@"%@", yourobj[99]);
    // becomes
    NSLog(@"%@", [yourobj objectAtIndexedSubscript: 99]);

Your code can fetch the index from an internal array, build a new object based on the index, log an error, abort(), start a game of pong, or whatever you want.

The corresponding setter has this prototype:

    - (void)setObject: (id)obj atIndexedSubscript: (NSUInteger)index;

You get the index and the object that's being set there, and then you do whatever you need to do with them to implement the semantics you want. Again, this is just a simple mechanical translation:

    yourobj[12] = @"hello";
    // becomes
    [yourobj setObject: @"hello" atIndexedSubscript: 12];

The two methods for object subscripts are similar. Their prototypes are:

    - (id)objectForKeyedSubscript: (id)key;
    - (void)setObject: (id)obj forKeyedSubscript: (id)key;

It's possible to implement all four methods on the same class. The compiler decides which one to call by examining the type of the subscript. Integer subscripts call the indexed variants, and objects call the keyed variants.

This is actually a small chunk of operator overloading now available in Objective-C, which traditionally has completely avoided it. As always, be careful with it to ensure that your custom implementations remain true to the spirit of the subscripting operator. Don't implement the subscripting syntax to append objects or send messages across the network. If you keep it restricted to fetching and getting elements of your object, the usage of the syntax remains consistent and you can more easily understand what code is doing without needing to know all the details.

Initializers
C has an odd quirk in that any initializer of a global variable must be a compile-time constant. This includes simple expressions, but not function calls. For example, the following global variable declaration is legal:

    int x = 2 + 2;

But this is not:

    float y = sin(M_PI);

C string literals are compile-time constants, so this is legal:

    char *cstring = "hello, world";

NSString literals are also compile-time constants, so the Cocoa equivalent is legal:

    NSString *nsstring = @"hello, world";

It's important to note that none of the new literal syntax qualifies as a compile-time constant. Assuming that the array is a global variable, the following is not legal:

    NSArray *array = @[ @"one", @"two" ];

This is because the @[] syntax literally translates into a call to an NSArray method. The compiler can't compute the result of that method at compile time, so it's not a legal initializer in this context.

It's interesting to explore exactly why this would be the case. The compiler lays out global variables in your binary, and they are loaded directly into memory. A global variable initialized with 2 + 2 results in a literal 4 being written into memory. A C string initializer results in the string contents being written out in the program's data, and then a pointer to those contents being written out as the global variable's value.

Note that C++, and therefore Objective-C++, does allow non-constant initializers for global variables. When the C++ compiler encounters such an expression, it packages into a function and arranges for that function to be called when the binary loads. Because the initializer code runs so early, it can be a bit dangerous to use, as other code like NSArray might not be ready to go yet. In any case, if you've seen a non-constant initializer compile and are wondering why, it was probably being compiled as C++.

NSString literals are also compile-time constants, because of a tight coupling between the compiler and the libraries. There's a special NSString subclass called NSConstantString with a fixed ivar layout:

    @interface NSSimpleCString : NSString {
    @package
        char *bytes;
        int numBytes;
    #if __LP64__
        int _unused;
    #endif
    }
    @end

    @interface NSConstantString : NSSimpleCString
    @end

It just contains an isa (inherited from NSObject), a pointer to bytes, and a length. When such a literal is used as a global variable initializer, the compiler simply writes out the string contents, then writes out this simple object structure, and finally initializes the global variable with a pointer to that structure.

You may have noticed that you don't need to retain and release NSString literals like you do other objects (although it's still a good idea to do so just out of habit). In fact, you can release them as many times as you want and it won't do anything. This is because NSString literals aren't dynamically allocated like most Objective-C objects. Instead, they're allocated at compile time as a part of your binary, and live for the lifetime of your process.

This tight coupling has advantages, like producing legal global variable initializers, and requiring no extra code to run to build the object at runtime. However, there are big disadvantages as well. The NSConstantString layout is set forever. That class must be maintained with exactly that data layout, because that data layout is baked into thousands of third-party apps. If Apple changed the layout, those third-party apps would break, because they contain NSConstantString objects with the old layout.

If NSArray literals were compile-time constants, there would need to be a similar NSConstantArray class with a fixed layout that the compiler could generate, and that would have to be maintained separately from other NSArray implementations. Such code could not run on older OSes which didn't have this NSConstantArray class. The same problem exists for the other classes that the new literals can produce.

This is particularly interesting in the case of NSNumber literals. Lion introduced tagged pointers, which allow an NSNumber's contents to be embedded directly in the pointer, eliminating the need for a separate dynamically-allocated object. If the compiler emitted tagged pointers, their format could never change, and compatibility with old OS releases would be lost. If the compiler emitted constant NSNumber objects, then NSNumber literals would be substantially different from other NSNumbers, with a possible significant performance hit.

Instead, the compiler simply emits calls into the framework, constructing the objects exactly like you would have done manually. This results in a bit of a runtime hit, but no worse than building them yourself without the new syntax, and makes for a much cleaner design.

Compatibility
When can we start using this new syntax? Xcode 4.3.3 is the latest shipping version and does not yet include these additions. We can reasonably expect that the next release, presumably coming with Mountain Lion, will incorporate these changes in its version of clang.

For OS compatibility, the literals simply generate code that calls standard Cocoa initializers. The result is indistinguishable from writing the code by hand.

The story for subscripting is a bit more complex. These require new methods that don't exist in Cocoa at the moment. However, the subscripting methods map directly to existing NSArray and NSDictionary methods, so we can expect a compatibility shim to be made available along the lines of the ARCLite shim that allows using ARC on OSes that predate it.

Conclusion
The new object literals and subscripting syntax in Objective-C can significantly reduce the verbosity of code that deals heavily with arrays and dictionaries. The syntax is similar to that found in common scripting languages, and makes code much easier to read and write, aside from a minor surplus of @ symbols.

That's it for today. Come back next time for another friendly exploration of the world of programming. Friday Q&A is as always driven by reader suggestions, so until then, if you have a topic that you'd like to see covered here, please send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Are you sure NSConstantString is still used by the runtime for string literals?

Also, the C++ initializers are ran after the ObjC runtime +load methods so if you're using NSArray it will probably be properly initialized when you're trying to make a literal. Don't you think?
The compiler generates objects of class __NSCFConstantString by default now, although that can be changed with a flag. But of course NSConstantString still has to live forever, since older binaries reference it.
How about an article on how to create your own ObjC literal?
Mike, your detailed article and Objective-C insights are a sight to behold. You do the language and Cocoa community a great service exploring "under the hood" topics. Thank you!
Eimantas: It's not possible to create your own literal syntax without hacking on the compiler, or possibly using the new C++11 stuff in conjunction with Objective-C++.
The subscripting compatibility shim is actually part of ARCLite. The __ARCLite__load function (which oddly enough is called by the objc runtime as if it were a +load method) takes care of dynamically adding the four subscript methods with class_addMethod. The implementations of these methods simply call the equivalent non-subscript methods. E.g. the objectAtIndexedSubscript: method is implemented like this:

id __arclite_objectAtIndexedSubscript(NSArray *self, SEL _cmd, NSUInteger idx)
{
    return [self objectAtIndex:idx];
}
Thank you very much for the nice article. It reminded me of a very similar syntax WebScript (a scripting version of Objective-C WebObjects once had) had, see Apple's documentation, http://developer.apple.com/legacy/mac/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.1/DevGuide/WebScript/CreatingObjects.html#REF65852

I never used WebScript or WebObjects to start with. I wonder how they were implemented.

By the way, WebScript also had a "modern" function call/definition syntax, http://developer.apple.com/legacy/mac/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.1/DevGuide/WebScript/ModernSyntax.html#REF54790 .
As a note, the return types of -objectAtIndexedSubscript: and -objectForKeyedSubcript: do not have to be id. There probably aren't that many times where this will actually come in handy, but it's there if you want it.
Jordan: I suspect if you're using ARC and you make those methods return non-objects, you'll be burned quite badly.
Ken: Oops, yes, I take it back. I thought you could declare these to return any type you wanted, but it actually produces an error if it's not an Objective-C type. (You can pick something other than id, though, just like any other covariant override.)

(How embarrassing, considering I work on Clang!)
Good stuff, Mike. Out of interest, I created a 3-part series in adding my own literal, NSURL to Clang. http://aussiebloke.blogspot.com/2012/06/llvm-clang-hacking-part-1.html

NSSet is interesting. Would be easy to play with an experimental literal syntax for NSSet, like:

NSSet *theSet = @| @"one", @"two", @"three" |

Stuart: I dunno if that set syntax would work without significantly rewriting the Objective-C grammar, since the single bar is going to get eaten by the parser as part of the assignment-expression. Looks like a job for a digraph.
jamie: The Clang parser is a recursive descent parser, so this would work just fine as you have pretty fine-grained control. When the parser hits an @ symbol in an expression, Clang calls the Parser::ParseObjCAtExpression (https://github.com/llvm-mirror/clang/blob/4d3db4eb6caa49a7cdbfe1798728ce4b23cd0b53/lib/Parse/ParseObjc.cpp#L2019) you can simply add to the switch "if the next symbol following an @ is "|" start parsing an NSSet literal.

Cheers
Stuart: I see. Are you sure I can't interest you in a nice unicode character pair? Seems like just the job for the black lenticular brackets :)

NSSet *vowels = @【@"a","@"e",@"i",@"o",@"u"】;
How about some emoji syntax?

NSSet *vowels = @💩@"a, @"e", @"i", @"o", @"u"💩;

(That probably won't show up for everybody. In fact, it doesn't show up for me.)

Regarding Stuart's syntax idea, I don't think it's workable. The problem is not the beginning, but the end, which results in ambiguity since | can be an operator or a set-end marker. For example:

NSSet *pathological = @| 0 | -1 | -2;

Of course, this is ultimately invalid code, but you can't know that until well after the actual parsing. There are probably ways to come up with an ambiguous case that is valid, but I don't quite have the energy to figure it out.
Very nice. I was hoping they would add some shortcut for +stringWithFormat: too, like @("foo %d", 1). Oh well.
Also, Xcode's auto-indentation is pretty weird:


        self.navigationBar.titleTextAttributes = @{
    UITextAttributeFont: [UIFont systemFontOfSize:12.0],
    UITextAttributeTextColor: [UIColor colorWithWhite:0.5 alpha:1.0],
    UITextAttributeTextShadowColor: [UIColor clearColor],
        };


I would expect it at least to look like:


        self.navigationBar.titleTextAttributes = @{
            UITextAttributeFont: [UIFont systemFontOfSize:12.0],
            UITextAttributeTextColor: [UIColor colorWithWhite:0.5 alpha:1.0],
            UITextAttributeTextShadowColor: [UIColor clearColor],
        };
Xcode's auto-indentation has trouble with existing Objective-C code. I'm a little disappointed, but not surprised, that it doesn't handle new literals so well yet. :-(
@Steven Degutis

just throw the following in a header....

#define $(...) ((NSString *)[NSString stringWithFormat:__VA_ARGS__,nil])

and then use like

[$(@"whatever%ld %@", 2, obj ) stringByRemoving......


Hi, could anyone explain to me why this throws a compiler error ("Expected expression before @ token", haha, which one?)

NSArray *myArray = @[@{@"x1":@0,@"y1":@100,@"x2":@100,@"y2":@110}];

Thanks and best regards,

rx
Never mind, had the old gcc compiler set -- sorry!
I'm so happy I found this. Great post!
@mike,

You are right, the vertical bars would pose a problem when parsing constant expressions.

NSSet *s = @(| @"test", @"one", @"two" |);

No whitespace for @(| and |) would make it more resilient…
It's also possible to substitute literal classes with `@compatibility_alias`.


#import <Foundation/NSObject.h>

@interface AAA : NSObject
+ (id)numberWithInt:(int)num;
@end

@implementation AAA
+ (id)numberWithInt:(int)num
{
    return @"AAAAA!!!"; // Abused type system just to check result.
}
@end

@compatibility_alias NSNumber AAA;
I'd be curious to know if there's anyway to use the @{} dictionary initialiser to initialise another class. For example:

NSFetchRequest * fetchRequest = @{@"Entity" : @"SomeEntity", @"attributeName" : @"value"};

which would be equivalent to:

NSFetchRequest * fetchRequest = [NSFetchRequest fetchRequestWithEntityName: @"Entity"];
fetchRequest.predicate = [NSPredicate predicateWithFormat: @"(%K == %@)", @"attributeName", @"value"];

One thing to be careful about with boxed expressions is that the underlying "type" of the resulting NSNumber object depends on the compile-time type of the expression boxed, which may sometimes not be what you expect. In Objective-C:


BOOL foo = NO;
NSLog(@"%@", @(foo).class); // prints "__NSCFBoolean"
NSLog(@"%@", @(!foo).class); // prints "__NSCFNumber"
NSLog(@"%@", @((BOOL)!foo).class); // prints "__NSCFBoolean"


Although we normally think of the NOT ("!") of a BOOL to be a BOOL, in C, and hence Objective-C, the result of "!" always has type int. It works fine when we assign it to a BOOL or use it in a comparison, but boxed expressions care a lot about the actual compile-time type, and creates the NSNumber object differently depending on it.

(Note that in C++, "!" returns type bool, so if you rename the file ".mm" to make it Objective-C++, then the above will all print "__NSCFBoolean". This change in behavior by just changing the language is also counterintuitive.)
Addendum to my previous post: Comparison operators in C (but not C++), like <, >=, ==, etc. also all return an int. So:

NSLog(@"%@", @(42 == 17).class); // prints "__NSCFNumber"

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Code syntax highlighting thanks to Pygments.
Hosted at DigitalOcean.