mikeash.com: Don't use strnstr

Posted at 2007-09-26 00:00 | RSS feed (Full text feed) | Blog Index
Next article: First Post
Previous article: Performance Comparisons of Common Operations
Tags: osbug security strnstr

Don't use strnstr

by Mike Ash

The strnstr function is broken on Mac OS X 10.4 (and presumably earlier) and should be avoided.

I've probably told most of the people who read this blog about this bug already, but just in case, I thought I'd make a post. I don't usually post when I find bugs in Apple's stuff, but this is a pretty fundamental string function so it's a bit more important.

The quick version is this: strnstr on 10.4 will sometimes read one byte beyond the end of the length which you specify. Under special circumstances, this will lead to a segmentation fault in correct code.

The following test program illustrates the bug:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    int size = 20480;
    char *str = malloc(size);
    memset(str, 'x', size);
    strnstr(str, "aa", size);
    return 0;
}

On my computer running 10.4.10 and everyone else running 10.4 who has tried it, this will crash on the strnstr line.

In order to trigger the crash, several conditions must be met. First, the buffer must be lacking a terminating NUL byte. (Note that this is allowed according to how strnstr is documented to operate.) The search string must not exist in the buffer. The end of the buffer must terminate on a page boundary. And finally, the page following the buffer must be unreadable.

This confluence of circumstances is pretty difficult to produce. I've been using it without incident for a long time. I only discovered the problem when stress-testing some HTTP parsing code, and my stress test happened to cause reads which were nicely lined up with the page size, and it happened to generate enough data to push the malloced block past the malloc memory pool limit and into its own dedicated pages.

This rarity is all the more reason to avoid the function, though, since it just means that when it crashes, it'll likely to be on a user's machine, and it'll likely to be extremely difficult to reproduce.

For my friends out there in Apple land, this bug has been filed as rdar://5504733.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Ahruman at 2007-09-26 23:35:00:

C strings: jus say AarrghaarrghpleeassennononoUGH.

mikeash at 2007-09-27 04:59:00:

I agree that C strings are unpleasant, but for doing some initial parsing of data whose encoding is not yet known, I haven’t found a better way. In this case, I’m parsing an HTTP request which is assumed to be UTF-8 but, for paranoia’s sake, I want to be able to fall back to Latin-1. To this end I need to find the end of the request in the incoming data but all of the higher-level string APIs require you to give them an encoding, and Cocoa’s higher-level data APIs don’t have this kind of search functionality.

Wormwood at 2007-10-02 18:20:00:

Forgive my naiveté... but wouldn’t the following line of code be more correct (or otherwise avoid a bug)?

strnstr( str, “aa”, size – strlen(“aa”

);

The man page is a little ambiguous with how “not more than len characters are searched” is phrased or intended. The implementation may vary well define “search” as matching the first character and overlook the length of the search string when determining when the internal loop stops. (strnstr is from BSD, so I wouldn’t be surprised if the bug is present or independently fixed there.)

alastair at 2007-10-02 18:52:00:

strnstr() isn’t a standard function anyway; it’s a FreeBSD extension. I’d be inclined to stick either to ISO C functions (if you need Windows portability) or to POSIX/SUSv3 functions (which, in Leopard, have been independently validated against the specification, so you know they behave the same way on other certified UNIX platforms).

I should add, by the way, that it’s quite common for implementations of the string functions to read over the “end” of the string by a small amount. Usually they do this so that they can read words (or even larger elements with SSE/Altivec), and normally this won’t cause a problem because they won’t issue a read that straddles a page boundary (i.e. they’ll read, at most, up to the end of the page where your NUL terminator is).

Colin Percival at 2007-10-02 19:34:00:

FreeBSD fixed this in February 2005: http://www.freebsd.org/cgi/query-pr.cgi?..

Andrew at 2007-10-02 21:20:00:

I dispute that the man page allows for the buffer (“str” in your example, “big” in the man page) to lack a null byte (man pages are terse because they depend on context: see the def’n of “big” under the description for strstr [no “n”] and note the fn is not called memnstr). Because it lacks a null byte the first argument you’re passing to strnstr isn’t a string so the behavior is not disallowed. Before you claim that searching no more than size/len bytes guarantees it won’t read past the end of your buffer, I would suggest that you don’t know why it is reading the character one past the end of your buffer. It could just be that it does the end of string test before the “search exceeds size/len” test. I agree the behavior in this case could be more friendly but it certainly cannot be considered a bug, merely unspecified.

Chris K at 2007-10-02 21:31:00:

Well, at least you motivated me to report another apple bug, this one in the regex library. Try these two:
echo ab | sed -nE s/((a)|(b)){0,}//p
echo ab | sed -nE s/((a)|(b)){0,2}//p
Hint: They should both have returned [b::b]

mikeash at 2007-10-03 22:33:00:

Wow, lots of feedback all of a sudden. Let’s take it individually.

Wormwood, it is not more correct. It does avoid this bug but it causes a new one. Subtracting the length of the string to be searched will cause it to miss matches at the very end of the string.

alastair, I don’t care about portability to other OSes but I absolutely need my code to work on Tiger, so features in Leopard are useless to me. What’s more, I was not able to find any POSIX/SUSv3 equivalent to strnstr. If you know of one, I’d appreciate hearing about it. And of course reading off the end of the string is legal as long as you don’t cross page boundaries, but the whole problem is that it does read past page boundaries, causing a segfault.

Andrew, I dispute your dispute! The function is quite clear that it will not read beyond the length you give it. That is really the entire reason it exists. Whether or not the string is terminated after the length you give it is irrelevant, because it has no way of legally finding out whether there’s a terminator or not. You will also notice that the man page goes out of its way to explicitly state when strings are terminated. The description of strstr says that it, “locates the first occurrence of the null-terminated string s2 in the null-terminated string s1.” (Emphasis mine.) The description of strnstr states, “locates the first occurrence of the null-terminated string s2 in the string s1….” (Emphasis mine, again.) Note how it neglects to say that s1 is null-terminated. I cannot imagine that this is anything other than intentional.

Steven Fisher at 2007-10-03 23:47:00:

Definitely looks like a simple bug to me. It looks like it isn’t a security problem, but I’m not sure about that. :)

mikeash at 2007-10-04 01:43:00:

It could potentially result in a denial of service, if a malicious person were able to transmit data which would trigger the segfault in a remote process. It might lead to a bit of data leakage by crafting malicious data and then seeing whether the remote process dies or not, but even this is pretty farfetched. I don’t see any way for it to remote code execution by itself.

Steven Fisher at 2007-10-04 17:37:00:

That’s pretty much what I was thinking. Since it’s not a write or an execution. there shouldn’t be any security implications. (Which in a way is too bad; it would probably be fixed faster if there was that potential.)

Jean-Daniel Dupas at 2007-10-19 17:59:00:

According to the man page, no more than len characters are “searched”. It does not say that no more than len characters are readed.
Else this function would be called strlstr().
The strn…() functions was not created to avoid buffer overflow, but to works on substring.
They all expects to have a valid C string as argument.

mikeash at 2007-10-21 19:27:00:

As I stated in my response to Andrew, the man page clearly states that it does not expect a null-terminated string. Perhaps this is not the intention, but if so then the man page is defective.

H W at 2008-04-01 14:59:50:

This was just fixed in Apple's latest security update. It only took half a year...!

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks to Pygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting: `<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.