mikeash.com: Friday Q&A 2011-04-01: Signal Handling

Posted at 2011-04-01 15:21 | RSS feed (Full text feed) | Blog Index
Next article: Link: Implementing imp_implementationWithBlock
Previous article: Friday Q&A 2011-03-18: Random Numbers
Tags: fridayqna gcd kqueue signal

Friday Q&A 2011-04-01: Signal Handling

by Mike Ash

Happy April Fool's Day to all my readers, and welcome to one web site which won't irritate you all day with bizarre practical jokes. Instead, I bring you another edition of Friday Q&A. In this edition, I will discuss various ways of handling signals in Mac programs, a topic suggested by friend of the blog Landon Fuller.

Signals
Signals are one of the most primitive forms of interprocess communication imaginable. A signal is just a small integer sent to a process. You can send a signal using the kill command, which also has a corresponding function available from C.

When a signal is delivered, it can terminate the process, pause/resume the process, be ignored, or invoke some custom code. That last option is called signal handling, and that is what I want to discuss today.

The list of defined signals can be seen in the header sys/signal.h. Many of these are used for familiar purposes. SIGINT is the signal generated when you press control-C in the shell. SIGABRT is used to kill your program when you call abort(), and SIGSEGV is the infamous segmentation fault, which pops up when you dereference a bad pointer.

Signal handling is esoteric and most programs don't need to worry about it at all. However, there are cases where it can be useful. For terminal and server programs, it's handy to catch SIGHUP, SIGINT, and other similar signals to do cleanup before exiting, as a sort of low-level version of Cocoa's applicationWillTerminate:. The SIGWINCH signal is handy for sophisticated terminal applications. SIGUSR1 and SIGUSR2 are user-defined signals which you can use for your own purposes.

sigaction
The lowest level interface for signal handling is the sigaction function. It provides some sophisticated and arcane options, but the important part is that it allows you to specify a function which is called when the signal in question is delivered:

    static void Handler(int signal)
    {
        // signal came in!
    }
    
    struct sigaction action = { 0 };
    action.sa_handler = Handler;
    sigaction(SIGUSR1, &action, NULL);

Nice and simple, right?

Wrong.

Reentrancy
The problem is that signals are delivered asynchronously, and the function registered here is also invoked asynchronously. Code always has to run on a thread somewhere. Depending on how the signal is generated, the handler is either run on the thread that the signal is associated with (for example, a SIGSEGV handler will run on the thread that segfaulted) or it will run on an arbitrary thread in the process. The problem is that it's essentially an interrupt in userland, and whatever code was running when it came in will be paused until the handler is done.

As anyone who was around in the classic Mac days knows, writing code that runs in an interrupt is hard. The problem is reentrancy. Many people confuse reentrancy with thread safety, but they are not the same concept, although they are somewhat similar.

Thread safety means that a particular piece of code can run on multiple threads at the same time safely. Thread safety is most commonly accomplished by using locks. A call acquires a lock, does work, releases the lock. A second thread that comes along in the middle will block until the first thread is done.

If code is reentrant that means that a particular piece of code can run multiple times on the same thread safely. This is different and considerably harder.

What if you take the thread safety approach of locking and apply it to reentrancy? The first call acquires the lock. While it's active, the code is called again. It tries to acquire the lock, but the lock is already taken, so it blocks. However, the first call can't run until the second call is done. The second call can't run until the first call is done. The result is a frozen program.

Writing reentrant code is hard, and as a result very few system functions are reentrant. Because a signal handler functions as an interrupt, it can only call reentrant code. You can't call something as simple as printf safely, because printf could take a lock, and if there's already an active call to printf on the thread where the handler runs, you'll deadlock.

The sigaction man page gives a list of functions you are allowed to call from a signal handler. It's pretty limited.

The complete list is: _exit(), access(), alarm(), cfgetispeed(), cfgetospeed(), cfsetispeed(), cfsetospeed(), chdir(), chmod(), chown(), close(), creat(), dup(), dup2(), execle(), execve(), fcntl(), fork(), fpathconf(), fstat(), fsync(), getegid(), geteuid(), getgid(), getgroups(), getpgrp(), getpid(), getppid(), getuid(), kill(), link(), lseek(), mkdir(), mkfifo(), open(), pathconf(), pause(), pipe(), raise(), read(), rename(), rmdir(), setgid(), setpgid(), setsid(), setuid(), sigaction(), sigaddset(), sigdelset(), sigemptyset(), sigfillset(), sigismember(), signal(), sigpending(), sigprocmask(), sigsuspend(), sleep(), stat(), sysconf(), tcdrain(), tcflow(), tcflush(), tcgetattr(), tcgetpgrp(), tcsendbreak(), tcsetattr(), tcsetpgrp(), time(), times(), umask(), uname(), unlink(), utime(), wait(), waitpid(), write(), aio_error(), sigpause(), aio_return(), aio_suspend(), sem_post(), sigset(), strcpy(), strcat(), strncpy(), strncat(), strlcpy(), strlcat().

Finally, the list ends with this amusing note: "...and perhaps some others." "Perhaps" is not a nice word to run into in this sort of documentation.

You can call your own reentrant code, but you probably don't have any, because it's hard to write, it can't call any system functions except from the above list, and you never had any reason to write it before. For the Objective-C types, note that objc_msgSend is not reentrant, so you cannot use any Objective-C from a signal handler.

There is very little that you can do safely. There is so little that I'm not even going to discuss how to get anything done, because it's so impractical to do so, and instead will simply tell you to avoid using signal handlers unless you really know what you're doing and you enjoy pain.

Fortunately, there are better ways to do these things.

kqueue
One of those better ways is to use kqueue. This is a low level operating service which allows a program to monitor many different events, and one of the events it can monitor is signals. You can create a kqueue just for signal handling, or you can add a signal handling event to an existing kqueue you already have within your program.

Setting things up is a bit more involved, but all in all not too hard. First, the kqueue is created:

    int fd = kqueue();

Next, add the signal filter to the queue:

    struct kevent event = { SIGUSR1, EVFILT_SIGNAL, EV_ADD, 0, 0 };
    kevent(fd, &event, 1, NULL, 0, NULL);

This tells the kqueue to watch for SIGUSR1 being delivered to the process. Note that kqueue exists separately from the lower level sigaction handling. Because we don't want the program to terminate when the signal is delivered, which is the default behavior, we also have to tell sigaction to ignore it:

    struct sigaction action = { 0 };
    action.sa_handler = SIG_IGN;
    sigaction(SIGUSR1, &action, NULL);

The kqueue is now ready. We can wait for it to receive an event by calling kevent again, this time not adding anything, but having it give us an event:

    struct kevent event;
    int count = kevent(fd, NULL, 0, &event, 1, NULL);
    if(count == 1)
    {
        if(event.filter == EVFILT_SIGNAL)
            printf("got signal %d\n", (int)event.ident);
    }

Note that because the handler runs normally, we can safely use printf or any other code when handling the signal. Convenient!

kqueue isn't always all that convenient to use in real programs, though. There are two reasonable ways to do it. One way is to have a dedicated signal handling thread which sits in a loop calling kevent repeatedly. Another way is to add the kqueue file descriptor to your runloop using something like CFFileDescriptor to integrate it with your Cocoa runloop. However neither of these is particularly great.

GCD
Finally we reach a signal handling solution which is extremely easy to use: Grand Central Dispatch. In addition to the better-known multiprocessing capabilities, GCD also includes a full suite of event monitoring abilities which match those of kqueue. (And in fact, GCD implements them using kqueue internally.)

To handle a signal with GCD, we create a dispatch source to monitor the signal:

    dispatch_source_t source = dispatch_source_create(DISPATCH_SOURCE_TYPE_SIGNAL, SIGUSR1, 0, dispatch_get_global_queue(0, 0));

Next, we set its event handler with a block to execute, and then resume the source to make it active:

    dispatch_source_set_event_handler(source, ^{
        printf("got SIGUSR1\n");
    });
    dispatch_resume(source);

Like with kqueue, this exists separately from sigaction, so we have to tell sigaction to ignore the signal:

    struct sigaction action = { 0 };
    action.sa_handler = SIG_IGN;
    sigaction(SIGUSR1, &action, NULL);

That's it! Every time a SIGUSR1 comes in, the handler is called. Because the source targets a global queue, the handler automatically runs in a background thread without interfering with anything else. If you prefer, you can give GCD a custom queue, or even the main queue, to control where the handler runs. Like with kqueue, because the handler runs normally on a normal thread, it's safe to do anything in it that you would do in any other piece of code. GCD makes signal handling convenient, easy, and safe.

Conclusion
Signal handling is a rare requirement, but sometimes useful. Using the low level sigaction to handle signals makes life unbelievably hard, as the signal handler is called in such a way as to place extreme restrictions on the code it contains. This makes it almost impossible to do anything useful in such a signal handler.

The best way to handle a signal in almost every case is to use GCD. Signal handling with GCD is easy and safe. On the rare occasions where you need to handle signals, GCD lets you do it with just a few lines of code.

If you can't or don't want to use GCD but still want to avoid sigaction, kqueue provides a good middle ground. While it's more complicated to set up and manage than the GCD approach, it still works well to handle signals in a reasonable manner.

That wraps up today's April Fool's edition of Friday Q&A. Come back in two weeks for the next one. Until then, as always, keep sending me your ideas for topics. Friday Q&A is driven by reader suggestions, so if you have something you would like to see covered, send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Jens Ayton at 2011-04-01 16:34:34:

There’s another option: sigwait() on a dedicated thread. It’s as simple as anything involving “dedicated thread”, and cross-platform.

mikeash at 2011-04-01 16:43:00:

That's a good one indeed. I prefer not having a dedicated thread whenever possible, but that does look like a decent way to go.

Kentzo at 2011-04-01 18:41:41:

It's important to say that there are two signals which cannot be handled: SIGKILL and SIGSTOP.

Dave Zarzycki at 2011-04-01 20:40:58:

1) App developers should call signal(SIGPIPE, SIG_IGN) at the top of main() and library developers should defensively/politely set SO_NOSIGPIPE via setsockopt(). Why? Because network connectivity problems can cause SIGPIPE to be sent to your process instead of the more reasonable "read() returns -1 and errno equal EPIPE" error.

2) Always remember to backup and restore the previous signal mask if one uses pthread_sigmask() or sigprocmask(). As a general rule, one cannot assume that one's caller hasn't also fiddled with the mask.

3) Library writers should never install real signals handlers via signal() or sigaction(). The kernel only supports one handler and therefore that is the right of the app, not libraries. Libraries should use technologies like GCD or kqueues directly (if one must).

Dave Zarzycki at 2011-04-01 20:51:30:

One more thing:

4) Keep in mind that setting SIGCHLD to SIG_IGN has standards defined side effects. Namely, one cannot call the wait*() family of APIs against child processes when SIGCHLD is ignored.

Gwynne Raskind at 2011-04-01 21:24:15:

Is there a difference, practically or conceptually, between "reentrant" and the even bigger mouthful "async-signal-safe"?

mikeash at 2011-04-01 21:50:44:

That's a good question. The short version is, "reentrant" is a general concept, and "async-signal-safe" applies it to the specific case of signal handlers.

Reentrancy doesn't necessarily have to apply to interrupts. For example, you'll find a note in the Cocoa documentation that NSNotificationCenter is reentrant. That most certainly does not mean that you can call it from a signal handler! Instead, what this means is that you can safely reenter NSNotificationCenter by calling into it from code which is in turn being called by NSNotificationCenter because of a posted notification.

That sort of reentrancy is much more useful (it's a good idea for almost any code with callbacks) and much easier to achieve (just make sure you're in a clean state and not holding any locks when you call the callback).

In the context of signal handling they're really the same, but in general not entirely.

Allen Brunson at 2011-04-02 13:57:08:

Given all the constraints, I've always written raw signal handlers to do nothing but set a flag, which is then noticed in some other part of the program, which takes action there. Can you think of any cases where that won't work?

Dave Zarzycki at 2011-04-02 14:10:13:

Allen -- there are two problems with the "set a bit" style of signal handling.

1) Race conditions. For example:

if (bit) do_something();
// signal fires
r = select();
// select doesn't return -1 with errno == EINTR in this case like one would expect. Therefore: the bit isn't noticed and acted upon until the next FD becomes readable/writable, which may be a long time

This is why pselect() was later invented, so that one might control when the signals fire. The availability of pselect() doesn't help a developer though if they're using a system provided event loop technology rather than rolling their own. This is one of may reasons why facilities like GCD exist.

2) The vast majority of app and library code doesn't check for errno being equal to EINTR after an error and retry. That is why when we were designing GCD, we blocked all of the maskable signals from being delivered on GCD threads.

mikeash at 2011-04-02 14:53:06:

If you really want to write a signal handler, take advantage of the fact that write() is a safe call to make from one. Create a pipe, stick the read end into your event system, and write a byte to the write end to signal. Make sure the pipe is nonblocking, though, otherwise you could be in serious trouble.

Of course there's no real reason to do that rather than using the built-in facilities which take care of the difficult parts for you.

arwyn at 2011-04-03 20:15:12:

Dave -- a race condition in "set a bit" style signal handling would be a bug, not an inherent problem with the method. It's pretty easy to do it safely and race condition free, like so:

signal handler:
do {
    old_bits = gSignalBits;
    new_bits = old_bits | (1 << signum);
    if (old_bits == new_bits)
        return;
} while(!CAS(new_bits,old_bits,&gSignalBits));
nonblocking_write(fd,1);

threaded signal dispatcher:
while(1)
{
    do {
        bits = gSignalBits;
    while(!CAS(bits,0,&gSignalBits));

    if (bits == 0)
        blocking_read(fd,1);
    else
    {
        if ((bits & DO_SOMETHING_BIT) != 0)
            do_something();
    }
}

arwyn at 2011-04-03 20:25:31:

Mike, you missed pointing out the ensuing hilarity of reentrancy and recursive locks when used with signal handlers.

At best they provide absolutely no protection what-so-ever because they are run on the same thread and just work. Which is typically what the misguided programmer thinks they wanted.

At worst they hang in the spin lock portion of the locking/unlocking primitive, unless it's entirely atomic implemented via a single CAS (most aren't). Which is just like with a regular lock, but with a such a small deadlock window that most programmer's don't catch them for years.

Yuhong Bao at 2014-01-10 07:53:40:

"Finally, the list ends with this amusing note: "...and perhaps some others." "Perhaps" is not a nice word to run into in this sort of documentation."
I think this is referring to memcpy(), memset(), and similar functions that are simple enough that they are pretty much always async signal safe.

mikeash at 2014-01-24 15:34:19:

I imagine you're right, but note that memcpy is not nearly as simple as it might appear. On OS X, for example, if you're copying enough memory (greater than 40kB or so, last I checked) and the arguments are nicely aligned, it will actually skip the copy altogether and instead use mach virtual memory calls to "copy" the memory without doing any actual copies.

Akos at 2014-02-11 17:51:02:

May I ask a question?

We are trying to detect memory leaks when quitting our OS X application by calling 'leaks' on our own process. In some cases 'leaks' crashes or hangs, which stalls our app (in a freed/wait4). For some reason I can't catch the SIGCHLD signal in our process.

The leak detection process is called in the termination function, registered with atexit(). The leak detection in its simplest form uses popen/fread/pclose, but I have tried also other approaches, from kevent, GCD, pthreads, and NSPipe. None of them seems to work, I always hang somewhere (wait4, kevent, sigsuspend_nocancel).

I'd appreciate any pointers where I might go wrong; I'm sure it's my inability to capture some important point.

Thanks a lot,

Akos (asomorjai at graphisoft.com)

mikeash at 2014-02-12 03:52:06:

This is a guess, so please don't take it as basic truth.

I think that leaks will pause your process, at least occasionally, while doing its analysis in order to capture a consistent snapshot of it.

If that's the case, then you end up with a deadlock. Leaks is writing into a pipe that's drained by your process. But your process is paused by leaks, so it can't drain the pipe. If the pipe fills up, leaks will block waiting for you to drain it, but if your process is still paused, you'll never drain it.

A simple workaround would be to have leaks write its data into a file instead of a pipe. Your app can then read out of the file when it's done. You can do this with popen by just tacking on > somefile to the command, or with NSTask by setting its stdout to an NSFileHandle.

Akos at 2014-02-12 08:21:00:

Thanks, Mike. We are already sending the output to a temporary file. Unfortunately that approach still hangs in pclose when 'leaks' crashes:

#0    0x00007fff83fb96ac in wait4 ()
#1    0x00007fff86afd894 in pclose ()
#2    0x00000001149a409d in DetectLeaks at /Users/asomorjai/Work/DevMain.8/Sources/GSRoot/GSRootDLL/LeakDetectorMac.mm:1442
#3    0x000000011476a25f in GSTermImage at /Users/asomorjai/Work/DevMain.8/Sources/GSRoot/GSRootDLL/GSRootMain.cpp:161
#4    0x00007fff86af1525 in __cxa_finalize ()
#5    0x00007fff86af368b in exit ()
#6    0x000000010000403b in start ()

Here's the actual code; it runs on the main thread:

extern "C" void DetectLeaks (void)
{

    if (!NeedsLeaks ())
        return;

     signal (SIGCHLD, SIG_DFL);

    pid_t    myPid = getpid ();
    char    s[2048], fn[128];

    sprintf (fn, "/tmp/leaks_%d.txt", myPid);
    sprintf (s, "leaks -exclude \"-[NSApplication(NSWindowsMenu) setWindowsMenu:]\" %d > %s\n", myPid, fn);
    printf ("%s\n", s);

    FILE *fp = popen (s, "r");
    if (fp != nullptr) {
        pclose (fp);

        fp = fopen (fn, "r");
        if (fp != nullptr) {
            SInt32 bytesRead = 1;
            while (bytesRead > 0) {
                char buffer [1024];
                bytesRead = fread (buffer, sizeof (char), sizeof (buffer) - 1, fp);
                buffer[bytesRead] = '\0';
                for (char *p = buffer; *p != '\0'; p++) {
                    if (*p == '|')
                        *p = '\n';
                }
                printf ("%s", buffer);
            }
            fclose (fp);
        }
    }
}

How can I detect that 'leaks' has finished/crashed, if —for some reason— I don't get the SIGCHLD signal? Or how can I detect where the SIGCHLD is delivered?

Thanks, Akos

Michael Hecht at 2014-05-20 17:59:43:

Akos, we do something similar. But instead of running leaks at app termination, we run it in response to a specific scripting command in our app's scripting language. This allows us to (optionally) run leaks after each test in a test stream. If a leak is found, we terminate our app which causes the testing driver to capture the leak report and restart on the next test.

I do it using system(). Here's the code:



#define LEAKS_LOG    "~/Library/Logs/DiagnosticReports/JMP-LeaksReport.txt"

void hostDebugLeaks()

{

    /*

     *    This function is called from JSL via the

     *        Debug( Leaks );

     *    command.

     *

     *    It works best if you have launched JMP with the symbol MallocStackLogging set to 1,

     *    like so:

     *        MallocStackLogging=1 ./JMP.app/Contents/MacOS/JMP

     *

     *    If leaks are found, this function writes a leaks report to ~/Library/Logs/

     *    DiagnosticReports/JMP-leaks.log. JMP is then exited.

     *

     *    This can be used with the TestBot in leaks detection mode. In this mode, TestBot will

     *    launch JMP with MallocStackLogging enabled, then run the UT Framework with

     *    _utDailyBuild=1 and _utLeaks=1. This causes UT Framework to check for leaks after

     *    running each unit test. If a leak is found, JMP exits and TestBot captures the leaks

     *    report for the leaking test. Then JMP is restarted and the test stream resumes on the

     *    following test.

     */



#if 0

    // Debugging - force a leak

    int * leaky = new int[42];

    leaky = 0;

#endif



    // Run the 'leaks' command on our process; capture output; look for magic "no leaks" string

    JString leaksCmd( "leaks ^PID 2>&1 | tee ^LOG | grep \": 0 leaks for 0 total leaked bytes.\"" );

    leaksCmd.replace( "^PID", getpid()).replace( "^LOG", LEAKS_LOG );

    int status = system( leaksCmd.value());



    // If status is 0, it means the magic "no leaks" string was found, so we don't have any leaks.

    // Delete the log file and return.

    if( status == 0 ) {

        JString rmLogCmd( "rm -f ^LOG" );

        rmLogCmd.replace( "^LOG", LEAKS_LOG );

        system( rmLogCmd.value());

        return;

    }



    // We have a leak; leave the log file intact and force JMP to exit

    NSLog( @"Leaks detected; exiting" );

    exit( EXIT_FAILURE );

}

I'm sure you can ignore/interpret the parts that are specific to our internal libraries.

Prior to Mavericks, the system() call would hang for us occasionally, because of a crash in leaks as you describe. I had the opportunity to discuss it with an Apple engineer at WWDC a few years back and he thought he knew what the problem was. I submitted a radar and it is apparently now fixed.

Rather than having our app clean up the call stack in the captured leak report as your code does, we have the testing driver ("TestBot" in the above comment) do it. It's a perl script and the cleanup code looks like this:



sub process_leaks_report {

    my( $file ) = @_;

    return unless -e $file;



    my $modified = 0;

    my $contents = '';

    open( LOG_FILE_IN, '<', $file ) || die "Cannot open $file: $!";

    

    while( <LOG_FILE_IN> ) {

        chomp;

        if( s/^(\s*Call stack: .*?:) \| // ) {

            $contents .= "$1\n\t\t";

            $contents .= join( "\n\t\t", reverse split( / \| /, $_ ));

            $contents .= "\n";

            ++$modified;

            next;

        }



        $contents .= "$_\n";

    }

    

    close( LOG_FILE_IN );



    if( $modified ) {

        open( LOG_FILE_OUT, '>', $file ) || die "Cannot create $file: $!";

        print LOG_FILE_OUT $contents;

        close( LOG_FILE_OUT );

    }

}

What interests me is your exclusion of setWindowsMenu:. I see that leak too. Have you reported it? Have you done any further investigation on it?

vb at 2015-01-06 21:52:17:

Mike, sorry for commenting on old post.

Is there any way to obtain siginfo_t when handling signal with kqueue?
With epoll on signalfd (Linux) its just as simple as read signalfd_siginfo from fd on which signal is received.

Amitai Hoze at 2016-03-03 17:13:42:

Hi, can I use kqueue or GCD to perform non "async-signal-safe" code when my iOS app crashes (e.g. on SIGILL, or SIGTRAP)? Namely, I want to open a dialer using [[UIApplication sharedApplication] openURL:[NSURL URLWithString:@"tel://1111111"]];, is that at all possible?

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks to Pygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting: `<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.