mikeash.com: Friday Q&A 2015-03-20: Preprocessor Abuse and Optional Parentheses

Posted at 2015-03-20 14:23 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2015-04-17: Let's Build Swift.Array
Previous article: Friday Q&A 2015-02-20: Let's Build @synchronized
Tags: c evil fridayqna preprocessor

Friday Q&A 2015-03-20: Preprocessor Abuse and Optional Parentheses

by Mike Ash

The other day I ran into an interesting problem: how can you write a C preprocessor macro that removes parentheses surrounding its argument, but leaves the argument alone if no parentheses are present? For today's article, I'm going to share my solution.

Motivation
The C preprocessor is a fairly blind textual replacement engine that doesn't really understand C code, let alone Objective-C. It works well enough for common situations, but occasionally it gets confused.

Here's a typical example:

    XCTAssertEqualObjects(someArray, @[ @"one", @"two" ], @"Array is not as expected");

This will fail to compile, and produce some really weird errors. The preprocessor looks for commas separating the macro arguments, and it doesn't understand that the stuff in @[...] should be considered a single argument. Thus, this code tries to compare someArray with @[ @"one". The assertion failure message is @"two" ] and @"Array is not as expected" is an additional argument. These half-formed components are inserted into the macro expansion of XCTAssertEqualObjects and the resulting code is nothing remotely legal.

Fixing this is easy: add parentheses. The preprocessor doesn't know about [], but it does know about () and is smart enough to ignore commas inside. This works:

    XCTAssertEqualObjects(someArray, (@[ @"one", @"two" ]), @"Array is not as expected");

In many parts of C, you can add superfluous parentheses without any penalty. After the macro is expanded, the resulting code still has the parentheses around the array literal, but they do no harm. You can write ludicrous expressions and the compiler happily digs to the bottom for you:

    NSLog(@"%d", ((((((((((42)))))))))));

You can even subject the NSLog to this:

    ((((((((((NSLog))))))))))(@"%d", 42);

There's one place in C where you can't just add random parentheses: types. For example:

    int f(void); // legal
    (int) f(void); // not legal

When would this matter? It's uncommon, but it comes up if you have a macro that uses a type, and you have a type that contains a comma that isn't inside parentheses. The macro could do any number of things, and types with un-parenthesized commas can occur in Objective-C when a type conforms to multiple protocols, and in C++ when using templated types with multiple template arguments. For example, here's a simple macro that creates getters that provide statically-typed values from a dictionary:

    #define GETTER(type, name) \
        - (type)name { \
            return [_dictionary objectForKey: @#name]; \
        }

You could use it like this:

    @implementation SomeClass {
        NSDictionary *_dictionary;
    }

    GETTER(NSView *, view)
    GETTER(NSString *, name)
    GETTER(id<NSCopying>, someCopyableThing)

No problem so far. Now imagine we want to make one that conforms to two protocols:

    GETTER(id<NSCopying, NSCoding>, someCopyableAndCodeableThing)

Oops! The macro doesn't work anymore. Adding parentheses won't help:

    GETTER((id<NSCopying, NSCoding>), someCopyableAndCodeableThing)

This produces invalid code. What we'd like to have is an UNPAREN macro that removes optional parentheses. The GETTER macro would be written:

    #define GETTER(type, name) \
        - (UNPAREN(type))name { \
            return [_dictionary objectForKey: @#name]; \
        }

How do we do it?

Requring Parentheses
It's easy to remove parentheses:

    #define UNPAREN(...) __VA_ARGS__
    #define GETTER(type, name) \
        - (UNPAREN type)name { \
            return [_dictionary objectForKey: @#name]; \
        }

This looks crazy, but it actually works. The preprocessor will expand type to (id<NSCopying, NSCoding>), producing UNPAREN (id<NSCopying, NSCoding>). It will then expand the UNPAREN macro to id<NSCopying, NSCoding>. Parentheses, begone!

However, the previous uses of GETTER now fail. For example, GETTER(NSView *, view) produces UNPAREN NSView * in the macro expansion. This is not expanded further, and is given to the compiler. The result is, naturally, a compiler error, since UNPAREN NSView * is nonsensical. This can be worked around by writing GETTER((NSView *), view), but it's annoying to be forced to add these parentheses. This is not what we want.

Macros Can't Be Overloaded
I immediately thought about how to get rid of the surplus UNPAREN. When you want an identifier to disappear, you can use an empty #define, like so:

    #define UNPAREN

With this present, the sequence a UNPAREN b turns into a b. Perfect! However, the preprocessor rejects this if another definition with arguments is already present. Even though the preprocessor could potentially choose one or the other, it won't allow both forms to be present simultaneously. This would work great if it could be done, but it's not allowed:

    #define UNPAREN(...) __VA_ARGS__
    #define UNPAREN
    #define GETTER(type, name) \
        - (UNPAREN type)name { \
            return [_dictionary objectForKey: @#name]; \
        }

This will fail to make it through the preprocessor, as it will complain about a duplicate #define for UNPAREN. It does puts us on the path to victory, though. The trick is to figure out a way to achieve the same effect without making both macros have the same name.

Bottleneck
The ultimate goal is for UNPAREN(x) and UNPAREN((x)) to both produce x. A step towards that goal is to make some macro where passing x and (x) produce the same output, even if it's not exactly x. This can be achieved by putting the macro name in the macro expansion, like so:

    #define EXTRACT(...) EXTRACT __VA_ARGS__

Now if you write EXTRACT(x) the result is EXTRACT x. And naturally, if you write EXTRACT x the result is also EXTRACT x, since no macro expansion takes place for that case. This still leaves us with a leftover EXTRACT. We can't simply #define it away, but it's progress.

Token Pasting
The preprocessor has an operator ## which pastes two tokens together. For example, a ## b becomes ab. This can be useful to construct identifiers from pieces, but it can also be used invoke macros. For example:

    #define AA 1
    #define AB 2
    #define A(x) A ## x

Given this, A(A) produces 1 and A(B) produces 2.

Let's combine this operator with the EXTRACT macro above to try to produce an UNPAREN macro. Since EXTRACT(...) produces the argument with a leading EXTRACT, we can use token pasting to produce some other token that ends in EXTRACT. If we #define that new token to nothing, we'll be all set.

Here's a macro ending in EXTRACT that produces nothing:

    #define NOTHING_EXTRACT

Here's an attempt at an UNPAREN macro that puts it all together:

    #define UNPAREN(x) NOTHING_ ## EXTRACT x

Unfortunately, this doesn't get the job done. The problem is order of operations. If we write UNPAREN((int)), we get:

    UNPAREN((int))
    NOTHING_ ## EXTRACT (int)
    NOTHING_EXTRACT (int)
    (int)

The token pasting happens too early, and the EXTRACT macro never gets expanded.

You can force the preprocessor to evaluate things in a different order by using indirection. Instead of using ## directly, let's make a PASTE macro:

    #define PASTE(x, ...) x ## __VA_ARGS__

Then we'll write UNPAREN in terms of it:

    #define UNPAREN(x) PASTE(NOTHING_, EXTRACT x)

This still doesn't work. Here's what happens:

    UNPAREN((int))
    PASTE(NOTHING_, EXTRACT (int))
    NOTHING_ ## EXTRACT (int)
    NOTHING_EXTRACT (int)
    (int)

It's closer, though. The sequence EXTRACT (int) shows up without a token pasting operator present. We just have to get the preprocessor to actually evaluate that before it sees the ##. Another layer of indirection will force it to behave. Let's define an EVALUATING_PASTE macro that just wraps PASTE:

    #define EVALUATING_PASTE(x, ...) PASTE(x, __VA_ARGS__)

Now let's use this one to write UNPAREN:

    #define UNPAREN(x) EVALUATING_PASTE(NOTHING_, EXTRACT x)

Here's the expansion:

    UNPAREN((int))
    EVALUATING_PASTE(NOTHING_, EXTRACT (int))
    PASTE(NOTHING_, EXTRACT int)
    NOTHING_ ## EXTRACT int
    NOTHING_EXTRACT int
    int

It still works without the surplus parentheses, as the extra evaluation is harmless there:

    UNPAREN(int)
    EVALUATING_PASTE(NOTHING_, EXTRACT int)
    PASTE(NOTHING_, EXTRACT int)
    NOTHING_ ## EXTRACT int
    NOTHING_EXTRACT int
    int

Success! We can now write GETTER to allow but not require parentheses around the type:

    #define GETTER(type, name) \
        - (UNPAREN(type))name { \
            return [_dictionary objectForKey: @#name]; \
        }

Bonus Macro
While coming up with macros that would justify this construct, I built a nice dispatch_once macro for making lazily-initialized constants. Here it is:

    #define ONCE(type, name, ...) \
        UNPAREN(type) name() { \
            static UNPAREN(type) static_ ## name; \
            static dispatch_once_t predicate; \
            dispatch_once(&predicate, ^{ \
                static_ ## name = ({ __VA_ARGS__; }); \
            }); \
            return static_ ## name; \
        }

Here's an example use:

    ONCE(NSSet *, AllowedFileTypes, [NSSet setWithArray: @[ @"mp3", @"m4a", @"aiff" ]])

Then you can call AllowedFileTypes() to obtain the set, and it's efficiently created on demand. In the unlikely event that the type contains a comma, you can add parentheses and it will still work.

Conclusion
By merely writing this macro, I am a horrible person who deserves terrible things. I hope that exposure to this terror does not warp your mind too much. Use this knowledge with care.

That's it for today. Come back next time for more exciting adventures, probably something less terrifying than this. Until then, if you have any suggestions for topics to cover here, please send them in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Terry A. Davis at 2015-03-20 21:57:14:

I changed the operator precedence rules in HolyC. In order to keep you on your toes, I warned if you put excess parens. This forces you to learn the new rules. You can turn-off the compiler warning.

Also, if you have a funtion with no args, the parens are optional.

I don't like macro functions so my preprocessor does not do #define functions.

russ at 2015-03-20 22:02:58:

Mike: Never change. Seriously.

PabloEsteban at 2015-03-21 04:39:27:

Maybe someone should point out that a typedef is probably a better idea in this case? Someone do that please, thanks.

Aleksander Balicki at 2015-03-21 13:42:39:

Is this post some kind of a 12 step program step? acknowledging that you have a problem with macros.

Mario Ströhlein at 2015-03-21 13:48:36:

Hey Mike, great PreProcessor magic. But I can't get it to work,
this is what I have

#define NOTHING_EXTRACT
#define PASTE(x, ...) x ## __VA_ARGS__
#define EVALUATING_PASTE(x, ...) PASTE(x, __VA_ARGS__)
#define UNPAREN(x) EVALUATING_PASTE(NOTHING_, EXTRACT x)

I'm using LLVM with clang fronted. Do I miss something ?

mikeash at 2015-03-21 14:56:58:

PabloEsteban: Now now, sensible solutions have no place here.

Aleksander Balicki: I think you could describe much of this blog in that way.

Mario Ströhlein: You're missing the EXTRACT macro:

#define EXTRACT(...) EXTRACT __VA_ARGS__

Add that and your stuff should work.

Ryan Brown at 2015-03-21 20:40:59:

Mike: nice post! Writing out the substitution and expansion steps makes things clear.

Are you aware of Boost Preprocessor?

BOOST_PP_REMOVE_PARENS() does exactly this.

http://www.boost.org/doc/libs/1_57_0/libs/preprocessor/doc/ref/remove_parens.html

Looks like a much different implementation from yours of course.

mikeash at 2015-03-21 21:24:00:

I agree that writing it out step by step really helps. I really didn't fully understand exactly WTF was going on until I wrote that part of the article. I had worked through the problems more or less as written and got it working with the extra level of indirection, but I didn't entirely see why it worked at the time.

I'm aware of Boost and their crazy preprocessor stuff in general. I didn't know about this particular bit. Thanks for the pointer. The implementation is interesting. It looks like they have a lot of good (heinous, evil) preprocessor infrastructure that they can then use to write the final macro a little more naturally.

Mario Ströhlein at 2015-03-21 23:08:53:

Thanks Mike. I blindly missed the very first EXTRACT macro.
Meanwhile I wrote another implementation of UNPAREN() as a one liner to extend my little preprocessor library:
http://svn.tweakbsd.org/wsvn/Kext.Heroine/trunk/Heroine/Include/PreProcessor.h

Love evil macros and enjoyed the article a lot. Thank you very much Mike.

Charles Parnot at 2015-03-24 15:17:29:

Nice write-up! The lesson is: if a macro can't do what you want, write 2. If 2 macros can't do what you want, write 3. If 3 macros can't do what you want, write 4. If 4 macros can't do what you want, write 5. If 5 macros do it, write a blog post.

Nicolas Goutaland at 2015-03-30 14:41:17:

As always, nice article !

Your lazy macro remind me one I wrote, to lazy initialize object properties. I got rid of the type parameter, to only use property name, and used ObjC runtime to determine object type à runtime.

It is available here https://github.com/nicolasgoutaland/LazyProperty

Regards,

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks to Pygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting: `<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.