mikeash.com: Friday Q&A 2013-01-11: Mach Exception Handlers

Posted at 2013-01-11 14:44 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2013-01-25: Let's Build NSObject
Previous article: Friday Q&A 2012-12-28: What Happens When You Load a Byte of Memory
Tags: debugging evil exception fridayqna guest mach mig signal

Friday Q&A 2013-01-11: Mach Exception Handlers

by Landon Fuller

This is my first guest Friday Q&A article, dear readers, and I hope it will withstand your scrutiny. Today's topic is on Mach exception handlers, something I've recently spent some time exploring on Mac OS X and iOS for the purpose of crash reporting. While there is surprisingly little documentation available about Mach exception handlers, and they're considered by some to be a mystical source of mystery and power, the fact is that they're actually pretty simple to understand at a high level - something I hope to elucidate here. Unfortunately, they're also partially private API on iOS, despite being used in a number of new crash reporting solutions - something I'll touch on in the conclusion.

Signals vs. Exceptions
On most UNIX systems, the only mechanism available for handling crashes (such as dereferencing NULL, or writing to an unwritable page) are the standard UNIX signal handlers. When a fatal machine exception is generated, it is caught by the kernel, which then executes a user-space trampoline within the failing process, executing any function previously registered by that process via sigaction(3) or signal(3).

On OS X, however, a much more versatile API exists: Mach exceptions. Dating back to Avie Tevanian's work on the Mach OS (yes, that Avie Tevanian), Mach exceptions build on Mach IPC/RPC to provide an alternative to the UNIX signal handler API. The original design of the Mach exception handling facility was first described, as far as I'm aware, in a 1988 paper authored by Avie Tevanian, among others. It remains fairly accurate to this day, and I'd recommend reading it for more details (after finishing this post, of course).

Mach exceptions differ from UNIX signals in three significant ways:

Exception information is delivered as a Mach message via a Mach IPC port, rather than by the kernel calling into a userspace trampoline.
Exception handlers may be registered by any process that has the appropriate mach port rights for the target process.
Exception handlers may be registered for a specific thread, a specific task (process), or for the entire host. The kernel will search for handlers in that order.

These differences introduce a number of properties that can be useful when implementing debuggers and crash reporters, and are what make the Mach API interesting as an alternative to BSD signals.

Exceptions are Messages
The Mach exception API is based on Mach RPC (which is, in itself, based on Mach IPC). There's a lot of confusion around Mach IPC, but at a high-level, it's not too dissimilar to UNIX sockets or other well-known IPC mechanisms that allow one to read/write messages between processes. Mach IPC communication occurs over mach ports, rather than via socket or other traditional UNIX mechanism; mach ports have unique names, and can be shared with other processes. They can be used to send and receive messages containing arbitrary data. There's a bit more complexity involved in their actual use, but conceptually, that's about all you need to know.

To write a Mach exception handler using raw Mach IPC, you would need to wait for a new exception message by calling mach_msg() on a Mach port previously registered as an exception handler (how to do this is covered below). The call to mach_msg() will block until an exception message is received, or the thread is interrupted. Once a message is received, you are free to introspect it for the state of the thread that generated the exception. You can even correct the cause of the crash and restart the failing thread, if you feel like hacking register state at runtime.

Since exceptions are provided as messages, rather than by calling a local function, exception messages can be forwarded to the previously registered Mach exception handler, even if that existing handler is completely out-of-process. This means that you can insert an exception handler without disturbing an existing one, whether it's the debugger or Apple's crash reporter. To forward the message to an existing handler, you also use mach_msg() to send the original message to a previously registered handler's mach port, using the MACH_SEND_MSG flag.

However, if you wish to respond the Mach RPC request yourself, rather than forwarding it, you would need to reply to the message, informing the sender whether or not you handled the exception. Mach considers an exception handled if the crashing thread's state has been corrected such that its execution can be resumed. In this case, the kernel does not attempt to find any other exception handler, and considers the matter settled. However, if you reply to the RPC request informing the sender (usually the kernel) that the exception has not been handled, the sender will then try to find the next applicable Mach exception handler. Remember that the kernel attempts to send exceptions to thread-specific, task-specific, and host-global exception handlers, in that order.

The fact that a reply is expected from the exception request can be used for interesting purposes. For example, if a debugger has its exception handler called when a breakpoint is hit, it can simply wait to reply to the Mach exception message until (and only if) you request that the debugger continue execution.

Mach RPC, not IPC
While above I described how one might implement mach exception handling with raw Mach IPC, the fact is that this is not how the interfaces are defined in Mach. Instead, Mach RPC uses an interface description language (called matchmaker in the original 1989 paper), to describe the format of Mach RPC requests (and their replies), and automatically generate code to handle received messages and generate a reply.

On OS X, the Mach RPC interface descriptions for exception handling - mach_exc.defs and exc.defs - are available via /usr/include/mach. If you include these files in your Xcode project, it will automatically run the mig(1) tool (Mach Interface Generator), generating headers and C source files necessary to receive and handle Mach exception messages. The exc.defs file provides an API for working with 32-bit exceptions, whereas the mach_exc.defs file provides an API for working with 64-bit exceptions. Unfortunately, the Mach RPC defs are not provided on iOS, and only a subset of the necessary generated headers are provided. As a result, it's not possible to implement a fully correct Mach exception handler on iOS without relying on undocumented functionality.

The code generated by MIG handles two things:

Interpreting incoming RPC messages and calling out to an existing handler function with the decoded data.
Initialize a response to the RPC messages using the return values from the handler function.

The generated code does not handle registering a Mach exception handler, receiving the Mach message, or actually sending the reply. That is the implementor's responsibility. In addition, there are multiple supported exception "behaviors" that provide different sets of information about an exception; it is the implementor's responsibility to provide callback functions for all of them.

This is best illustrated in the following 64-bit safe code, intended to work with RPC code generated by mach_exc.defs (I've left out error handling for simplicity):

    // Handle EXCEPTION_DEFAULT behavior
    kern_return_t catch_mach_exception_raise (mach_port_t exception_port,
                                               mach_port_t thread,
                                               mach_port_t task, 
                                               exception_type_t exception,
                                               mach_exception_data_t code,
                                               mach_msg_type_number_t codeCnt)
    {
        // Do smart stuff here.
        fprintf(stderr, "My exception handler was called by exception_raise()\n");

        // Inform the kernel that we haven't handled the exception, and the
        // next handler should be called.
        return KERN_FAILURE;
    }

    extern boolean_t mach_exc_server (mach_msg_header_t *msg, mach_msg_header_t *reply);
    static void exception_server (mach_port_t exceptionPort) {
        mach_msg_return_t rt;
        mach_msg_header_t *msg;
        mach_msg_header_t *reply;

        msg = malloc(sizeof(union __RequestUnion__mach_exc_subsystem));
        reply = malloc(sizeof(union __ReplyUnion__mach_exc_subsystem));

        while (1) {
             rt = mach_msg(msg, MACH_RCV_MSG, 0, sizeof(union __RequestUnion__mach_exc_subsystem), exceptionPort, 0, MACH_PORT_NULL);
             assert(rt == MACH_MSG_SUCCESS);

             // Call out to the mach_exc_server generated by mig and mach_exc.defs.
             // This will in turn invoke one of:
             // mach_catch_exception_raise()
             // mach_catch_exception_raise_state()
             // mach_catch_exception_raise_state_identity()
             // .. depending on the behavior specified when registering the Mach exception port.
             mach_exc_server(msg, reply);

             // Send the now-initialized reply
             rt = mach_msg(reply, MACH_SEND_MSG, reply->msgh_size, 0, MACH_PORT_NULL, 0, MACH_PORT_NULL);
             assert(rt == MACH_MSG_SUCCESS);
        }
    }

You'll note from the example code that our exception handler is called a server. In Mach RPC parlance, the kernel would be the client: it issues RPC requests to our exception server, and waits for our reply.

Exception Behaviors
As described above, exception messages come in multiple formats, containing varying types of data. It's the implementor's responsibility to register for the correct behavior; the mig-generated RPC code will interpret the messages and hand it off to a user-defined function for the specific type. There are three basic behaviors defined by the Mach Exception API:

EXCEPTION_DEFAULT: Exception messages will contain a reference thread that triggered it. Handled by catch_exception_raise().
EXCEPTION_STATE: Exception messages will contain the register state of the triggering thread, but not a reference to the thread itself. Handled by catch_exception_raise_state().
EXCEPTION_STATE_IDENTITY: Exception messages will contain the register state of the triggering thread, as well as a reference to the triggering thread. Handled by catch_exception_raise_state_identity().

In addition to the above behaviors, an additional variant was added in later OS X releases to support 64-bit safety. The MACH_EXCEPTION_CODES flag may be set by OR'ing it with any of the listed behaviors, in which case 64-bit safe exception messages will be provided. This flag is used by LLDB/GDB even when targeting 32-bit processes. When using the MACH_EXCEPTION_CODES flag, one must also use the RPC functions generated by mach_exc.defs; these use the mach_ prefix for all functions and types.

Generally speaking, EXCEPTION_DEFAULT or EXCEPTION_STATE_IDENTITY are sufficient for most purposes. Since EXCEPTION_DEFAULT behavior provides a reference to the triggering thread, you can also fetch the thread state that would normally be provided via EXCEPTION_STATE_IDENTITY via the Mach thread_state() API.

When registering your exception handler, you are responsible for requesting the MACH_EXCEPTION_CODES behavior that matches the RPC implementation (exc.defs or mach_exc.defs) that you intend to use.

Putting it Together
It's time to get down to brass tacks: actually registering an mach port to receive exception messages. As noted above, handlers can be registered for threads, tasks, and the host, and there are different sets of identical APIs for each:

(thread|task|host)_get_exception_ports: Returns the currently registered set of exception ports.
(thread|task|host)_set_exception_ports: Sets the exception port that will be used for all future exceptions.
(thread|task|host)_swap_exception_ports: Atomically set a new exception port, and return the current ports. This can be used to avoid race conditions that could otherwise occur if multiple handlers are registered concurrently.

To register your handler, you'll need to first allocate a mach port to receive the messages, insert a "send right" to permit sending responses, and then call one of the exception port set() or swap() functions to register it as a receiver of exception messages.

For example (error handling again elided for conciseness):

    mach_port_t server_port;
    kern_return_t kr = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &server_port);
    assert(kr == KERN_SUCCESS);

    kr = mach_port_insert_right(mach_task_self(), &server_port, &server_port, MACH_MSG_TYPE_MAKE_SEND);
    assert(kr == KERN_SUCCESS);

    kr = task_set_exception_ports(task, EXC_MASK_BAD_ACCESS, server_port, EXCEPTION_DEFAULT|MACH_EXCEPTION_CODES, THREAD_STATE_NONE);

If you wish to preserve the previous exception handlers, task_swap_exception_ports() should be used in place of task_set_exception_ports().

Conclusion
Mach exception handlers are a very useful tool, and using them requires a fair bit of moving pieces, but hopefully they don't seem dauntingly complex. At the end of the day, mach exceptions are just a simple exception message, coupled with a reply, sent over Mach ports.

There are some signficiant advantages of the Mach API over signal handlers, including the ability to forward exceptions out-of-process, and handle all exceptions on a completely different stack - something that can be useful when handling an exception triggered by a stack overflow on the target thread.

If you plan on implementing your own mach exception handler, there are certainly more details worth further investigation:

When forwarding mach exceptions, you need to send an exception message that matches the previous registered handler's exception flavor. This may mean populating a new Mach exception message with additional thread state.
It's not strictly necessary to use the MIG-generated exc_server() or mach_exc_server() functions for interpreting Mach messages (though it is probably a good idea). Since mig(1) generates structures that may be used to directly interpret the Mach exception messages, you can do so directly.
If you forward exception messages for exceptions that occur in your own process, you need to be sure that the target for the reply is not also your own process. Single-stepping debuggers will only resume the thread they wish to step; that means that they won't resume your exception handler's thread, you'll never receive the reply, and the interrupted thread will never resume.

Lastly, I should highlight that the headers and mach interfaces required to implement a correct mach exception handler on iOS are not available (though they are available and public on Mac OS X). I filed a radar requesting their addition (rdar://12939497), as well as an Apple DTS support incident to clarify the situation. The radar is still open, but DTS provided the following guidance:

Our engineers have reviewed your request and have determined that this would be best handled as a bug report, which you have already filed. There is no documented way of accomplishing this, nor is there a workaround possible.

In the meantime, as far as I can determine through my own work, and as per DTS's feedback, it's not possible to implement Mach exception handling on iOS using only public API. Hopefully this will be resolved in a future release of iOS, such that we can safely adopt Mach exceptions.

Thus concludes my first contribution to Friday Q&A. If you have any questions, feel free to drop me an e-mail. If I got anything terrible wrong, feel free to roast me in the comments.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Jean-Daniel at 2013-01-11 18:30:58:

Nice article.

Just for the record, you don't have to write your own dispatch loop when using MIG. Just call the mach_msg_server() or mach_msg_server_once() functions (declared in mach/mach.h) that will take care of this for you.

Frank Illenberger at 2013-01-16 04:05:17:

Tanks for the great article.
Wouldn't it be more modern to deal with mach messages by using a dispatch source of the type DISPATCH_SOURCE_TYPE_MACH_RECV?
i have never used them this way but I successfully use them for dealing with signals. Would be interesting to know if it works and if it prevents you from having to block a thread.

Landon Fuller at 2013-01-16 16:03:22:

@Frank --

Using dispatch would work if you were monitoring another process' exceptions, at which point I guess it's just a matter of style. Most of the code will be identical, but you'll have registered a block with GCD instead of parking a thread.

If running in-process, it would result in non-async-safe dispatch code executing after a crash occurred. For example, a thread that GCD relies upon could be the crashing thread, in which case it is now suspended and GCD can't execute your block.

The same general class of async-safety issues apply to in-process signal handlers registered via GCD; it's a safe mechanism for handling signals in the case that the process state is valid after the signal is thrown (eg, SIGCHLD).

The in-process case is further complicated by the fact that debuggers (by default) will suspend all threads in the crashing process in a number of cases (such as single stepping). The only way to get around this is to pre-empt all of the debuggers' exception handlers with your own, re-insert the debugger's handlers upon any exception, and forward the exception message to the debugger unmodified. This effectively takes your to-be-suspended handler thread out of the picture.

If you implement the above mechanism and somehow miss any exception types that the debugger has registered for, you'll have a potential deadlock on your hands, since your exception handler will be bypassed and the de-registration code simply won't run. In other words, in-process crash reporting is annoying. The easiest solution is to use sysctl() to detect if you're running under a debugger, and then fallback to a signal handler, which AFAIK is what everyone writing iOS crash reporters using Mach exceptions is currently doing, if they're handling this case at all (nevermind that they really shouldn't be using Mach exceptions on iOS to begin with).

Ideally, we could have fork() on iOS (XPC would be great!) and just handle things out-of-process, potentially with GCD as you've suggested. But I'm getting off-topic ... :)

Jean-Daniel at 2013-01-29 08:56:53:

@Landon: "Ideally, we could have fork() on iOS (XPC would be great!) and just handle things out-of-process"

Out of curiosity, how are you planning to pass the exception port to the child task as fork does not preserve mach port and, AFAIK XPC if based on UNIX Socket and does not provide a way to send a mach port ?

Landon Fuller at 2013-02-04 16:59:11:

@Jean-Daniel: What I was hoping is that we could (ab)use the bootstrap port for the purpose, but looking into it, this might be incompatible with XPC (it'd be nice if XPC could send mach ports, too). Don't suppose you have any thoughts on the matter?

Damien Sorresso at 2013-02-13 18:56:25:

XPC is based on Mach IPC.

Richard Brooksby at 2013-07-12 16:02:19:

This article was very helpful in writing the protection exception handling code in the Memory Pool System http://www.ravenbrook.com/project/mps/master/code/protxc.c (also on GitHub here https://github.com/Ravenbrook/mps-temporary )

That may be a useful example for anyone else trying to do similar things.

Thanks!

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Code syntax highlighting thanks to Pygments.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
	Formatting: `<i> <b> <blockquote> <code>`.
	NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.