Next article: Friday Q&A 2009-08-14: Practical Blocks
Previous article: Friday Q&A 2009-07-10: Type Specifiers in C, Part 3
Tags: c fridayqna
Greetings and welcome back to Friday Q&A. This week I'm going to discuss some tips and tricks for using printf
-style format strings in C, as suggested by Kevin Avila.
Introduction
Almost everyone doing C or Objective-C programming uses format strings. In C, they're used by the printf
family of functions. In Cocoa, NSLog
and NSString
both use them. They're a powerful way to build strings, but many people only know the basics. This week I'll delve into some hidden corners to take full advantage of the power it offers. Note that if you don't know the basics already, this article isn't going to make a lot of sense to you, so read up on a good printf
tutorial before continuing.
Finding the Documentation
Hopefully all my readers know this, but just in case: if you type man printf
at your shell prompt, you will get a bunch of confusing stuff that does not appear relevant to C programming. That's because you're actually reading the documentation for the shell command printf
, not the C function. To see documentation on the C function, you need to type man 3 printf
. The Cocoa documentation also contains information on format strings, but since the only significant difference in Cocoa format strings is the addition of the %@
specifier for printing the -description
of objects, I like to just use the printf
documentation.
Varags and Type Promotion
Format strings are always used with a function (or method) that takes variable arguments. This is important for several reasons.
First, the more obvious reason is that C doesn't provide any mechanism for the called function to know how many or what type of variable arguments it got. This means that your format string must exactly match the arguments you provide. Any mismatch could lead to bad output or a crash.
The less obvious reason is that C promotes types in values that get passed as variable arguments. In short, anything smaller than an int
gets promoted to int
, and float
gets promoted to double
. So when you pass in a char
, you'll use a format specifier for int
to print it, and likewise with passing a float
and using a double
specifier.
Types of Unknown Size
Frequently when programming in C or Cocoa you'll use a typedef
whose definition is not guaranteed. Examples of this are size_t
, socklen_t
, NSInteger
, and CGFloat
.
For size_t
it's easy: printf
actually has a format specifier for size_t
: use the z
with one of the standard int
specifiers.
For CGFloat
it's also easy: because float
gets promoted to double
, the same %f
specifier will work with either. No need to change anything.
For socklen_t
and NSInteger
you need to get a little cleverer. You can't use %d
because they might be bigger than an int
. You can't use %ld
or %lld
because they might be smaller than those, and type promotion doesn't carry over. They could even be bigger than those. What you'll want to do here is make an explicit cast to your variable to a size you know will be large enough to hold it, and then use that specifier. For example:
printf("%jd", (intmax_t)myNSInteger);
Strings of Limited Length
The %s
specifier will print a C string. This is tremendously handy. However sometimes you want to print a sequence of characters that isn't necessarily a C string. For this, you can use the .
(that's a period) modifier to specify a length. For example, here is a convenient way to turn a FourCharCode
into an NSString:
uint32_t valSwapped = CFSwapInt32HostToBig(fcc); // FCCs are stored backwards on Intel
NSString *str = [NSString stringWithFormat:@"%.4s", &valSwapped;];
.4
tells NSString
that the string is only four characters long, which keeps it from running off the end.
Sometimes you don't know the length ahead of time. This used to happen a lot with Pascal strings, but they're getting pretty rare these days. For this, you can use *
as your length, and then it will read the length as a separate argument. (Note that this separate argument must be of type int, so beware types of unknown size!)
Here's an example of that:
printf("%.*s", length, charbuffer);
printf("%.*s", pstring[0], pstring + 1);
Printing pointers is a handy thing to do but many people don't know how to do it right. You often see code like this:
printf("0x%x", pointer);
int
.
The correct way is easy: just use the %p
specifier. You get nice hexadecimal output and the type always matches.
Beware of NULL
This one is so commonly ignored that gcc
and clang
actually have a workaround just for this, but it's still interesting to know. NULL
can legally just be a #define
to 0
, like so:
#define NULL 0
NULL
as a pointer argument to a vararg function like NSLog
, your code is no longer conformant, because you're really passing an int
! For example, this is, strictly speaking, wrong:
printf("%p", NULL);
nil
.)
This is easy to fix: if you ever need to do this sort of thing, you can just cast the NULL
to a pointer type like so:
printf("%p", (void *)NULL);
NULL
-terminated list of arguments, like -[NSArray arrayWithObjects:]
or execl
. Yes, that means all of the code out there which looks like this is, strictly speaking, wrong:
[NSArray arrayWithObjects:a, b, c, nil];
gcc
and clang
have a workaround for this. They #define
NULL
to be a magic symbol which has either pointer or integer type depending on the context in which it's used, so the correct pointer value is passed into the function.
Always Constant Format Strings
I see far too much code which does this:
NSLog(someString);
someString
contains the character sequence %@
, or another format specifier? Then you probably crash.
It gets worse. What if you do this with printf
or similar instead, and someString
comes from a source outside your control, like off the internet? Then horrible things can occur.
One of the format specifiers supported by printf
(but not Cocoa) is the %n
specifier. This is very different from the other specifiers, in that it actually gives you a value back instead of taking one from you. It wants an int *
argument, and will write the number of characters written so far into that argument. For example:
printf("%d%n%d", a, &howmany, b);
howmany
will contain the width of the first integer being printed.
If an attacker has control over the format string, then they can use the %n
specifier to write an arbitrary value to a location in memory! This can then be used to take over your program. This attack is not theoretical.
In general, you should not pass anything other than a constant string as a format string. Every so often it is useful to build a format string dynamically first, but think hard before you do this whether you can accomplish your goal without that, and if you do it, then take extra care to ensure that your string will always be valid.
Random Access Arguments
Typical format string usage is straight through start to finish. The first specifier uses the first argument, the second specifier uses the second argument, etc. However this is not mandatory! You can actually have any specifier use any argument. This is done by adding n$
to the format specifier, where n
is the argument number to print. Arguments count from 1. For example, this prints the two arguments in reverse order:
printf("a = %2$d b = %1$d", b, a);
printf("%1$s could not be accessed, error %d. Try rebooting %1$s.", name, err);
printf("a = %2$d", b, a);
Conclusion
That wraps up this week's Friday Q&A. There's a lot more to what format strings can do than what I discussed today. Read the man page and take a look at how you can control precision, padding, output formats, and more.
Friday Q&A will be going on hiatus for at least one week and probably two due to various things which are going to keep me busy in that time.
In the meantime, keep those suggestions coming in. The more topics I have to choose from, the better topics you'll be able to read, so send them in!
Comments:
printf("a = %$2d b = %$1d", b, a);
When in reality it should be:
printf("a = %2$d b = %1$d", b, a);
The dollar sign should be after the digit.
Cheers,
Dave
Are you talking about a hypothetical future version of Mac OS X? As of Mac OS X, it's always the same size as a long on all currently-supported architectures. I believe this goes for the iPhone as well.
Well, it can be. In C++ and Objective-C++, it may be. But in C and Objective-C, it isn't. See the definitions of __DARWIN_NULL in <sys/_types.h>, Nil and nil in <objc/objc.h>, and NULL everywhere that matters.
This is mainly valid as a portability concern: Some other operating system may be more free-wheeling in its headers' definition of NULL, and *then*, it's worth being careful with how you use NULL.
That's an extension; it's not part of the C99 standard. Moreover, it doesn't even work with printf; it's only available in Core Foundation. (And you forgot a \n in your format string.)
I mean Mac OS X 10.5.7.
[NSArray arrayWithObjects:a, b, c, nil];
It depends what you mean by "strictly". The C99 standard define NULL as a pointer, so this code is correct as long as you use a C99 compliant compiler.
That's not what I said. I said it always has the same size as long on current Mac OS X.
You are correct that using
%ld
will correctly print an NSInteger on all current Cocoa architectures. And two years ago, pointers were always 32-bit on all current Cocoa architectures. Four years ago, integers were always big-endian on all current Cocoa architectures. If you write your code to depend on today's assumptions, your code will break tomorrow.
Numbered argument specifiers are not part of the C standard but they are part of the POSIX standard, so unless you need your code to be portable to non-POSIX platforms you can depend on them to exist. See http://www.opengroup.org/onlinepubs/000095399/functions/printf.html
However my example which mixes numbered specifiers and non-numbered specifiers is not supported at all. It's an all-or-nothing thing.
Jean-Daniel Dupas: I don't believe you're correct that C99 defines NULL as a pointer. The C99 standard is available here: http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf
The relevant passages are this:
And:
These both appear on page 47. Thus NULL can be correctly defined as 0, (void *)0, (3 - 3), (void *)(42/43), etc. No statement is made about it being required to be a pointer type as far as I can see.
But I managed to find a interesting sentence in POSIX though:
3.244 Null Pointer
The value that is obtained by converting the number 0 into a pointer; for example, (void *) 0. The C language guarantees that this value does not match that of any legitimate pointer, so it is used by many functions that return pointers to indicate an error.
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html
Unless POSIX also defines NULL to be a "null pointer" then I'm afraid that definition isn't relevant to the question. All this definition means is that NULL is not necessarily a null pointer.
Something that's been very helpful to me when printf-debugging is using macros to print variables without ever having to mess with format strings. It turns out that 90% of the time I can just say `LOG_ID(name)` and the "name = Vincent" information is all I needed.
Details here:
http://vgable.com/blog/2008/08/05/simpler-logging-2/
Also, Dave Dribin created an excellent DDToNSString() function that can automagically convert a C-type into an NSSString:
http://www.dribin.org/dave/blog/archives/2008/09/22/convert_to_nsstring/
I've been using a modified DDToNSString() in a LOG_EXPR() macro that (mostly) Just Works no matter what type it's given. Once I've worked out a few more kinks, and understand the esoteric build settings it needs, I'll write something up on it.
Well, except that a Class is a valid id. And that we use NULL all over the place (NSError **, anyone?).
However, it's not a problem for things like NSError **. The fact that NULL (and nil and Nil) can be an integer 0 is only a problem when using varargs. For explicitly typed parameters, the 0 will be converted to the null pointer.
This sentence makes it seem like all functions which take variable arguments also take format strings. I know that's not what you meant, but a better wording would be "Functions or methods which take format strings always take a variable number of arguments."
That brings up another good Friday Q&A idea. Maybe you should cover implementing a function that uses variadic arguments, some of the pitfalls in doing so, etc. (Maybe even touch on variadic arguments in preprocessor macros.)
Also, Jean-Daniel, NULL and a null pointer are different. A null pointer is a pointer which has been assigned the value NULL. NULL itself is just 0.
Other fun Mac OS X-specific format string tidbits...
* NSString and CFString may be constructed with a format string, and you can specify "%@" to print the description of a Cocoa or CF object, respectively.
* The syslog(3) API allows you to specify "%m" to print the current errno. This does not require a corresponding argument in the argument list, so use with care.
Thanks for the article idea, I'll put it on my list.
printf("0x%08x\n", (uint32_t)ptr);
And use llx instead of x for 64-bit systems. Whenever you're printing out pointer values, 99.99% of the time you're debugging something, so you know the size of pointers on your platform. Hence, it's ok to be lazy and ditch the pointer-to-integer cast entirely.
I also strongly recommend always compiling with the -Wformat warning option (enabled with -Wall) with GCC -- it'll help you catch a lot of easy-to-miss errors often due to typos such as too many arguments, not enough arguments, mismatched format specifiers and arguments, etc.
GCC also has a nifty `format' function attribute which you can use to tag any functions you write that are wrappers around printf/scanf (such as a custom logging function), and it can then check the arguments you pass to that -- see http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bformat_007d-function-attribute-2291 for more info.
<pre> uint32_t valSwapped = CFSwapInt32HostToBig(fcc); // FCCs are stored backwards on Intel
NSString *str = [NSString stringWithFormat:@"%.4s", &valSwapped;];
</pre>
While %.Ns is a clever trick, this wont actually work in general because OSTypes are defined to be in MacRoman the character set where stringWithFormat uses the system encoding which may be different.
Instead you need to use something like:
NSData* data = [NSData dataWithBytes:&valSwapped length:sizeof(valSwapped)];
return [[[NSString alloc]initWithData:data encoding:NSMacOSRomanStringEncoding] autorelease];
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.