NSBlog

objc_msgSend's New Prototype

Mike Ash — Fri, 11 Oct 2019 12:09:00 GMT

objc_msgSend's New Prototype

Apple's new OSes are out. If you've looked through the documentation, you may have noticed that the prototype for objc_msgSend has changed. Previously, it was declared as a function that took id, SEL, and variadic arguments after that, and returned id. Now it's declared as a function that takes and returns void. Similar functions like objc_msgSendSuper also became void/void. Why the change?

The True Prototype
There's a big and surprisingly difficult question behind this: what is the true prototype of objc_msgSend? That is to say, what parameters does it actually take, and what does it actually return? This question doesn't have a straightforward answer.

You may have heard that objc_msgSend is implemented in assembly because it's so commonly called that it needs every bit of performance it can get. This is true, but not entirely complete. It's not possible to implement it in C at any speed.

The fast path of objc_msgSend does a few critical things:

Load the class of the object.
Look up the selector in that class's method cache.
Jump to the method implementation found in the cache.

From the perspective of the method implementation, it looks like the caller invoked it directly. Because objc_msgSend jumps straight to the method implementation without making a function call, it effectively disappears once its job is done. The implementation is careful not to disturb any of the registers that can be used to pass arguments to a function. The caller calls objc_msgSend as if it was going to directly call the method implementation, passing all of the parameters in the same way it would for a direct function call. Once objc_msgSend looks up the implementation and jumps to it, those parameters are still exactly where the implementation expects them to be. When the implementation returns, it returns directly to the caller, and the return value is provided by the standard mechanism.

This answers the above question: the prototype of objc_msgSend is that of the method implementation it ends up calling.

But wait, isn't the whole point of dynamic method lookup and message sending that you don't know what method implementation you'll be calling? This is true! However, you do know what type signature the implementation will have. The compiler can get this information from the declaration of the method in an @interface or @protocol block, and uses that to generate the appropriate parameter passing and return value fetching code. If you override a method, the compiler complains if you don't match the type signature. It's possible to work around this by hiding declarations or adding methods at runtime, and in that case you can end up with a type signature for a method implementation that doesn't match the call site. The behavior of such a call then depends on how those two type signatures match up at the ABI level, with anything from perfectly reasonable and correct behavior (if the ABIs match so all the parameters happen to line up) to complete nonsense (if they don't).

This hints at an answer to this article's question: the old prototype worked in some circumstances (when the ABIs matched) and failed strangely in others (when the ABIs didn't match). The new prototype never works unless you cast it to the appropriate type first. As long as you cast it to the correct type, it always works. The new way of doing things thus encourages doing things correctly and makes it harder to do things wrong.

The Minimal Prototype
Although the prototype of objc_msgSend depends on the method implementation that will be called, there are two things that are common across all method implementations: the first parameter is always id self, and the second parameter is always SEL _cmd. The number and type of any additional parameters is unknown, as is the return type, but those two parameters are known. objc_msgSend needs these two pieces of information to perform its method dispatch work, so they always have to be in the same place for it to be able to find them.

We could write an approximate generalized prototype for objc_msgSend to represent this:

    ??? objc_msgSend(id self, SEL _cmd, ???)

Where ??? means that we don't know, and it depends on the particular method implementation that will be called. Of course, C has no way to represent a wildcard like this.

For the return value, we can try to pick something common. Since Objective-C is all about objects, it would make sense to assume the return value is id:

    id objc_msgSend(id self, SEL _cmd, ???)

This not only covers cases where the return value is an object, but also cases where it's void and some other cases where it's a different type but the value isn't used.

How about the parameters? C actually does have a way to indicate an arbitrary number of parameters of arbitrary types, in the form of variadic function prototypes. An ellipsis at the end of the parameter list means that a variable number of arbitrarily typed values follows:

    id objc_msgSend(id self, SEL _cmd, ...)

This is exactly what the prototype used to be before the recent change.

ABI Mismatches
The pertinent question at runtime is whether the ABI at the call site matches the ABI of the method implementation. Which is to say, will the receiver retrieve the parameters from the same location and in the same format that the caller passes them? If the caller puts a parameter into $rdx then the implementation needs to retrieve that parameter from $rdx, otherwise havoc will ensue.

The minimal prototype may be able to express the concept of passing an arbitrary number of arbitrary types, but for it to actually work at runtime, it needs to use the same ABI as the method implementation. That implementation is almost certainly using a different prototype, and usually has a fixed number of arguments.

There is no guarantee that the ABI for a variadic function matches the ABI for a function with a fixed number of arguments. On some platforms, they match almost perfectly. On others, they don't match at all.

Intel ABI
Let's look at a concrete example. macOS uses the standard System V ABI for x86-64. There is a ton of detail in the ABI, but we'll focus on the basics.

Parameters are passed in registers. Integer parameters are passed in registers rdi, rsi, rdx, rcx, r8, and r9, in that order. Floating point parameters are passed in the SSE registers xmm0 through xmm7. When calling a variadic function, the register al is set to the number of SSE registers that were used to pass parameters. Integer return values are placed in rax and rdx, and floating-point return values are placed in xmm0 and xmm1.

The ABI for variadic functions is almost identical to the ABI for normal functions. The one exception is passing the number of SSE registers used in al. However, this is harmless when using the variadic ABI to call a normal function, as the normal function will ignore the contents of al.

The C language messes things up a bit. C specifies that certain types get promoted to wider types when passed as a variadic argument. Integers smaller than int (such as char and short) get promoted to int, and float gets promoted to double. If your method signature includes one of these types, it's not possible for a caller to pass a parameter as that exact type if it's using a variadic prototype.

For integers, this doesn't actually matter. The integer gets stored in the bottom bits of the appropriate register, and the bits end up in the same place either way. However, it's catastrophic for float. Converting a smaller integer to an int just requires padding it out with extra bits. Converting float to double involves converting the value to a different structure altogether. The bits in a float don't line up with the corresponding bits in a double. If you try to use a variadic prototype to call a non-variadic function that takes a float parameter, that function will receive garbage.

To illustrate this problem, here's a quick example:

    // Use the old variadic prototype for objc_msgSend.
    #define OBJC_OLD_DISPATCH_PROTOTYPES 1

    #import <Foundation/Foundation.h>
    #import <objc/message.h>

    @interface Foo : NSObject @end
    @implementation Foo
    - (void)log: (float)x {
        printf("%f\n", x);
    }
    @end

    int main(int argc, char **argv) {
        id obj = [Foo new];
        [obj log: (float)M_PI];
        objc_msgSend(obj, @selector(log:), (float)M_PI);
    }

It produces this output:

    3.141593
    3370280550400.000000

As you can see, the value came through correctly when written as a message send, but got completely mangled when passed through an explicit call to objc_msgSend.

This can be remedied by casting objc_msgSend to have the right signature. Recall that objc_msgSend's actual prototype is that of whatever method will end up being invoked, so the correct way to use it is to cast it to the corresponding function pointer type. This call works correctly:

    ((void (*)(id, SEL, float))objc_msgSend)(obj, @selector(log:), M_PI);

ARM64 ABI
Let's look at another relevant example. iOS uses a variation on the standard ABI for ARM64.

Integer parameters are passed in registers x0 through x7. Floating point parameters are passed in v0 through v7. Additional parameters are passed on the stack. Return values are placed in the same register or registers where they would be passed as parameters.

This is only true for normal parameters. Variadic parameters are never passed in registers. They are always passed on the stack, even when parameter registers are available.

There's no need for a careful analysis of how this will work out in practice. The ABIs are completely mismatched and a method called with an uncast objc_msgSend will receive garbage in its parameters.

The New Prototype
The new prototype is short and sweet:

    void objc_msgSend(void);

This isn't correct at all. However, neither was the old prototype. This one is much more obviously incorrect, and that's a good thing. The old prototype made it easy to to use it without casting it, and worked often enough that you could easily end up thinking everything was OK. When you hit the problematic cases, the bugs were very unclear.

This prototype doesn't even allow you to pass the two required parameters of self and _cmd. You can call it with no parameters at all, but it'll immediately crash and it should be pretty obvious about what went wrong. If you try to use it without casting, the compiler will complain, which is much better than weird broken parameter values.

Because it still has a function type, you can still cast it to a function pointer of the appropriate type and invoke it that way. This will work correctly as long as you get the types right.

Friday Q&A 2018-06-29: Debugging with C-Reduce

Mike Ash — Fri, 29 Jun 2018 13:35:00 GMT

Friday Q&A 2018-06-29: Debugging with C-Reduce

Debugging a complex problem is tough, and it can be especially difficult when it's not obvious which chunk of code is responsible. It's common to attempt to produce a reduced test case in order to narrow it down. It's tedious to do this manually, but it's also the sort of thing computers are really good at. C-Reduce is a program which automatically takes programs and pares them down to produce a reduced test case. Let's take a look at how to use it.

Overview
C-Reduce is based on two main ideas.

First, there's the idea of a reduction pass. This is a transformation performed on some source code which produces a reduced version of that code. C-Reduce has a bunch of different passes, including things like deleting lines or renaming tokens to shorter versions.

Second, there's the idea of an interestingness test. The reduction passes are blind, and often produce programs which no longer contain the bug, or which don't compile at all. When you use C-Reduce, you provide not only a program to reduce but also a small script which tests whether a reduced program is "interesting." Exactly what "interesting" means is up to you. If you're trying to isolate a bug, then "interesting" would mean that the bug still occurs in the program. You can define it to mean whatever you want, as long as you can script it. Whatever test you provide, C-Reduce will try to provide a reduced version of the program that still passes the test.

Installation
C-Reduce has a lot of dependencies and can be difficult to install. Thankfully, Homebrow has it, so you can let it take care of things:

    brew install creduce

If you'd rather do it yourself, take a look at C-Reduce's INSTALL file.

Simple Example
It's difficult to come up with small examples for C-Reduce, since its whole purpose is to start from something large and produce a small example, but we'll give it our best try. Here's a simple C program that produces a somewhat cryptic warning:

    $ cat test.c
    #include <stdio.h>

    struct Stuff {
        char *name;
        int age;
    }

    main(int argc, char **argv) {
        printf("Hello, world!\n");
    }
    $ clang test.c
    test.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
    struct Stuff {
    ^
    test.c:3:1: note: change return type to 'int'
    struct Stuff {
    ^~~~~~~~~~~~
    int
    test.c:10:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
    2 warnings generated.

Somehow our struct is messing with main! How could that be? Maybe reducing it would help us figure it out.

We need an interestingness test. We'll write a small shell script to compile this program and check for the warning in the output. C-Reduce is eager to please and can easily reduce a program far beyond what we really want. To keep it under control, we'll write a script that not only checks for the warning, but also rejects any program that produces an error, and requires struct Stuff to be somewhere in the compiler output. Here's the script:

    #!/bin/bash

    clang test.c &> output.txt
    grep error output.txt && exit 1
    grep "warning: return type of 'main' is not 'int'" output.txt &&
    grep "struct Stuff" output.txt

First, it compiles the program and saves the compiler output into output.txt. If the output contains the text "error" then it immediately signals that this program is not interesting by exiting with error code 1. Otherwise it checks for both the warning and for struct Stuff in the output. grep exits with code 0 if it finds a match, so the result is that this script exits with code 0 if both of those match, and code 1 if either one fails. Exit code 0 signals to C-Reduce that the reduced program is interesting, while code 1 signals that it's not interesting and should be discarded.

Now we have enough to run C-Reduce:

    $ creduce interestingness.sh test.c 
    ===< 4907 >===
    running 3 interestingness tests in parallel
    ===< pass_includes :: 0 >===
    (14.6 %, 111 bytes)

    ...lots of output...

    ===< pass_clex :: rename-toks >===
    ===< pass_clex :: delete-string >===
    ===< pass_indent :: final >===
    (78.5 %, 28 bytes)
    ===================== done ====================

    pass statistics:
      method pass_balanced :: parens-inside worked 1 times and failed 0 times
      method pass_includes :: 0 worked 1 times and failed 0 times
      method pass_blank :: 0 worked 1 times and failed 0 times
      method pass_indent :: final worked 1 times and failed 0 times
      method pass_indent :: regular worked 2 times and failed 0 times
      method pass_lines :: 3 worked 3 times and failed 30 times
      method pass_lines :: 8 worked 3 times and failed 30 times
      method pass_lines :: 10 worked 3 times and failed 30 times
      method pass_lines :: 6 worked 3 times and failed 30 times
      method pass_lines :: 2 worked 3 times and failed 30 times
      method pass_lines :: 4 worked 3 times and failed 30 times
      method pass_lines :: 0 worked 4 times and failed 20 times
      method pass_balanced :: curly-inside worked 4 times and failed 0 times
      method pass_lines :: 1 worked 6 times and failed 33 times

              ******** .../test.c ********

    struct Stuff {
    } main() {
    }

At the end, it outputs the reduced version of the program that it came up with. It also saves the reduced version into the original file. Beware of this when working on real code! Be sure to run C-Reduce on a copy of the code (or on a file that's already checked into version control), not on an irreplaceable original.

This reduced version makes the problem more apparent: we forgot the semicolon at the end of the declaration of struct Stuff, and we forgot the return type on main, which causes the compiler to interpret struct Stuff as the return type to main. This is bad, because main has to return int, thus the warning.

Xcode Projects
That's fine for something we've already reduced to a single file, but what about something more complex? Most of us have Xcode projects, so what if we want to reduce one of those?

This gets awkward because of the way C-Reduce works. It copies the file to reduce into a new directory, then runs your interestingness script there. This allows it to run a lot of tests in parallel, but this breaks if you need other stuff for it to work. Since your interestingness script can run arbitrary commands, you can work around this by copying the rest of the project into the temporary directory.

I created a standard Cocoa Objective-C app project in Xcode and then modified the AppDelegate.m file like so:

    #import "AppDelegate.h"

    @interface AppDelegate () {
        NSWindow *win;
    }

    @property (weak) IBOutlet NSWindow *window;
    @end

    @implementation AppDelegate

    - (void)applicationDidFinishLaunching: (NSRect)visibleRect {
        NSLog(@"Starting up");
        visibleRect = NSInsetRect(visibleRect, 10, 10);
        visibleRect.size.height *= 2.0/3.0;
        win = [[NSWindow alloc] initWithContentRect: NSMakeRect(0, 0, 100, 100) styleMask:NSWindowStyleMaskTitled backing:NSBackingStoreBuffered defer:NO];
        [win makeKeyAndOrderFront: nil];
        NSLog(@"Off we go");
    }


    @end

This strange code crashes the app on startup:

    * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
      * frame #0: 0x00007fff3ab3bf2d CoreFoundation`__CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 13

This is not a very informative backtrace. We could try to debug (or just notice the problem), but instead let's reduce!

The interestingness test needs to do some more work here. Let's start with a helper to run the app with a timeout. We're looking for a crash, and if the app doesn't crash it'll just stay open, so we need to kill it after a few seconds. I found this handy perl snippet repeated all over the internet:

    function timeout() { perl -e 'alarm shift; exec @ARGV' "$@"; }

Next, we need to copy the Xcode project over:

    cp -a ~/Development/creduce-examples/Crasher .

The AppDelegate.m file isn't automatically placed in the appropriate location, so copy that across. (Note: C-Reduce will copy the file back if it finds a reduction, so be sure to use cp here rather than mv. Using mv will result in a cryptic fatal error.)

    cp AppDelegate.m Crasher/Crasher

Then switch into the Crasher directory and build the project, exiting on failure:

    cd Crasher
    xcodebuild || exit 1

If it worked, run the app with a timeout. My system is configured so that xcodebuild places the build result in a local build directory. Yours may be configured differently, so check first. Note that if your configuration builds to a shared build directory, you'll want to disable C-Reduce's parallel builds by adding --n 1 to the command line when invoking it.

    timeout 5 ./build/Release/Crasher.app/Contents/MacOS/Crasher

If it crashes, it'll exit with the special code 139. Translate that into an exit code of 0, and in all other cases exit with code 1:

    if [ $? -eq 139 ]; then
        exit 0
    else
        exit 1
    fi

Now we're ready to run C-Reduce:

    $ creduce interestingness.sh Crasher/AppDelegate.m
    ...
    (78.1 %, 151 bytes)
    ===================== done ====================

    pass statistics:
      method pass_ints :: a worked 1 times and failed 2 times
      method pass_balanced :: curly worked 1 times and failed 3 times
      method pass_clex :: rm-toks-7 worked 1 times and failed 74 times
      method pass_clex :: rename-toks worked 1 times and failed 24 times
      method pass_clex :: delete-string worked 1 times and failed 3 times
      method pass_blank :: 0 worked 1 times and failed 1 times
      method pass_comments :: 0 worked 1 times and failed 0 times
      method pass_indent :: final worked 1 times and failed 0 times
      method pass_indent :: regular worked 2 times and failed 0 times
      method pass_lines :: 8 worked 3 times and failed 43 times
      method pass_lines :: 2 worked 3 times and failed 43 times
      method pass_lines :: 6 worked 3 times and failed 43 times
      method pass_lines :: 10 worked 3 times and failed 43 times
      method pass_lines :: 4 worked 3 times and failed 43 times
      method pass_lines :: 3 worked 3 times and failed 43 times
      method pass_lines :: 0 worked 4 times and failed 23 times
      method pass_lines :: 1 worked 6 times and failed 45 times

              ******** /Users/mikeash/Development/creduce-examples/Crasher/Crasher/AppDelegate.m ********

    #import "AppDelegate.h"
    @implementation AppDelegate
    - (void)applicationDidFinishLaunching:(NSRect)a {
      a = NSInsetRect(a, 0, 10);
      NSLog(@"");
    }
    @end

That's a lot shorter! The NSLog line looks harmless, although it must be part of the crash if C-Reduce didn't remove it. The a = NSInsetRect(a, 0, 10); line is the only other thing that actually does something. Where does a come from and why would writing to it do something bad? It's just the parameter to applicationDidFinishLaunching: which... is not an NSRect.

    - (void)applicationDidFinishLaunching:(NSNotification *)notification;

Oops! The parameter type mismatch resulted in stack corruption that caused the uninformative crash.

C-Reduce took a long time to run on this example, because building an Xcode project takes longer than compiling a single file, and because a lot of the test cases hit the five-second timeout when running. C-Reduce copies the reduced file back to the original directory on every success, so you can leave it open in a text editor to watch it at work. If you think it's gone far enough, you can ^C it and you'll be left with the partially-reduced file. If you decide you want to run it some more, re-run it and it will continue from there.

Swift
What if you're using Swift and want to reduce a problem? Given the name, I originally thought that C-Reduce only worked on C (and maybe C++, since so many tools do both).

Thankfully, I was wrong. C-Reduce does have some C-specific reduction passes, but it has a lot of others that are relatively language agnostic. It may be less effective, but as long as you can write an interestingness test for your problem, C-Reduce can probably work on it no matter what language you're using.

Let's try it. I found a nice compiler bug on bugs.swift.org. It's already been fixed, but Xcode 9.3's Swift crashes on it and I happen to have that version handy. Here's a slightly modified version of the example from that bug:

    import Foundation

    func crash() {
        let blah = ProblematicEnum.problematicCase.problematicMethod()
        NSLog("\(blah)")
    }

    enum ProblematicEnum {
        case first, second, problematicCase

        func problematicMethod() -> SomeClass {
            let someVariable: SomeClass

            switch self {
            case .first:
                someVariable = SomeClass()
            case .second:
                someVariable = SomeClass()
            case .problematicCase:
                someVariable = SomeClass(someParameter: NSObject())
                _ = NSObject().description
                return someVariable // EXC_BAD_ACCESS (simulator: EXC_I386_GPFLT, device: code=1)
            }

            let _ = [someVariable]
            return SomeClass(someParameter: NSObject())
        }

    }

    class SomeClass: NSObject {
        override init() {}
        init(someParameter: NSObject) {}
    }

    crash()

Let's try running it with optimizations enabled:

    $ swift -O test.swift 
    <unknown>:0: error: fatal error encountered during compilation; please file a bug report with your project and the crash log
    <unknown>:0: note: Program used external function '__T04test15ProblematicEnumON' which could not be resolved!
    ...

The interestingness test is fairly simple for this one. Run that command and check the exit code:

    swift -O test.swift
    if [ $? -eq 134 ]; then
        exit 0
    else
        exit 1
    fi

Running C-Reduce on this, it produces the following example:

    enum a {
        case b, c, d
        func e() -> f {
            switch self {
            case .b:
                0
            case .c:
                0
            case .d:
                0
            }
            return f()
        }
    }
    class f{}

Diving into the actual compiler bug is beyond the scope of this article, but this reduction would be really handy if we actually set out to fix it. We have a considerably simpler test case to work with. We can also infer that there's some interaction between the swift statement and the instantiation of the class, since C-Reduce probably would have removed one of them if it were unnecessary. This would give us some good hints about what might be happening in the compiler to cause this crash.

Conclusion
Blind reduction of a test case is not a very sophisticated debugging technique, but the ability to automate it can make it extremely useful. C-Reduce can be a fantastic addition to your debugging toolbox. It's not suitable for everything, but what is? For problems where it's useful, it can help enormously. It can be a bit tricky to get it to work with multi-file test cases, but some cleverness with the interestingness script solves the problem. Despite the name, it works out of the box on Swift and many other languages, so don't give up on it just because you're not working in C.

That's it for today. Check back next time for more fun, games, and code. Friday Q&A is driven by reader ideas, so if you have something you'd like to see covered here next time or some other time, please send it in!

Friday Q&A 2018-04-27: Generating Text With Markov Chains in Swift

Mike Ash — Sat, 28 Apr 2018 01:27:00 GMT

Friday Q&A 2018-04-27: Generating Text With Markov Chains in Swift

Markov chains make for a simple way to generate realistic looking but nonsensical text. Today, I'm going to use that technique to build a text generator based on this blog's contents, an idea suggested/inspired by reader Jordan Pittman.

Markov Chains
At a theoretical level, a Markov chain is a state machine where each transition has a probability associated with it. You can walk through the chain by choosing a start state and then transitioning to subsequent states randomly, weighted by the transition probabilities, until you reach a terminal state.

Markov chains have numerious applications, but the most amusing is for text generation. There, each state is some unit of text, typically a word. The states and transitions are generated from some input corpus, and then text is generated by walking through the chain and outputting the word for each state. The result rarely makes sense, as the chain doesn't contain enough information to retain any of the underlying meaning of the input corpus, or even much of the grammatical structure, but that lack of sense can be hilarious.

Representation
The nodes in the chain will be represented as instances of a Word class. This class will store a String for the word it represents, and a set of links to other words.

How do we represent that set of links? The most obvious approach would be some sort of counted set, which would store other Word instances along with a count of the number of times that transition was seen in the input corpus. Randomly choosing a link from such a set can be tricky, though. A simple way is to generate a random number betewen 0 and the total count of the entire set, then iterate through the set until you encounter that many links, and choose the link you landed on. This is easy but slow. Another approach would be to precompute an array that stores the cumulative total for each link in the array, then do a binary search on a random number between 0 and the total. This is harder but faster. If you want to get really fancy, you can do even more preprocessing and end up with a compact structure you can query in constant time.

Ultimately, I decided to be lazy and use a structure that's extremely wasteful in space, but efficient in time and easy to implement. Each Word contains an array of subsequent Words. If a link occurs multiple times, the duplicates remain in the array. Choosing a random element with the appropriate weight consists of choosing a random index in the array.

Here's what the Word class looks like:

    class Word {
        let str: String?
        var links: [Word] = []

        init(str: String?) {
            self.str = str
        }

        func randomNext() -> Word {
            let index = arc4random_uniform(UInt32(links.count))
            return links[Int(index)]
        }
    }

Note that the links array will likely result in lots of circular references. To avoid leaking memory, we'll need to have something manually clean those up.

That something will be the Chain class, which will manage all of the Words in a chain:

    class Chain {
        var words: [String?: Word] = [:]

In deinit, it clears all of the links arrays to eliminate any cycles:

        deinit {
            for word in words.values {
                word.links = []
            }
        }

Without this step, a lot of the Word instances would leak.

Let's look at how words are added to the chain. The add method will take an array of Strings, each one of which holds a word (or any other unit that the caller wants to work with):

        func add(_ words: [String]) {

If there aren't actually any words, bail out early:

            if words.isEmpty { return }

We want to iterate over pairs of words, where the second element in the pair is the word that follows the first element. For example, in the sentence "Help, I'm being oppressed," we want to iterate over ("Help", "I'm"), ("I'm", "being"), ("being", "oppressed").

Actually, we want a bit more as well, since we want to encode the beginning and end of the sentence. We represent those as nil, so the actual sequence we want to iterate over is (nil, "Help"), ("Help", "I'm"), ("I'm", "being"), ("being", "oppressed"), ("oppressed", nil).

To allow for nil, we need an array whose contents are String? rather than plain String:

            let words = words as [String?]

Next, we'll construct two arrays, one by prepending nil and one by appending nil. Zipping them together produces the sequence we want:

            let wordPairs = zip([nil] + words, words + [nil])
            for (first, second) in wordPairs {

For each word in the pair, we'll fetch the corresponding Word object using a handy helper function:

                let firstWord = word(first)
                let secondWord = word(second)

Then all we have to do is add secondWord into the links of firstWord:

                firstWord.links.append(secondWord)
            }
        }

The word helper fetches the instance from the words dictionary if it exists, otherwise it creates a new instance and puts it into the dictionary. This frees other code from worrying about whether there's already a Word for any given string:

        func word(_ str: String?) -> Word {
            if let word = words[str] {
                return word
            } else {
                let word = Word(str: str)
                words[str] = word
                return word
            }
        }

Finally, we want to generate new sequences of words:

        func generate() -> [String] {

We'll generate the words one by one, accumulating them here:

            var result: [String] = []

Loop "forever." The exit condition doesn't map cleanly to a loop condition, so we'll handle that inside the loop:

            while true {

Fetch the Word instance for the last string in result. This neatly handles the initial case where result is empty, since last produces nil which indicates the first word:

                let currentWord = word(result.last)

Randomly get a linked word:

                let nextWord = currentWord.randomNext()

If the linked word isn't the end, append it to result. If it is the end, terminate the loop:

                if let str = nextWord.str {
                    result.append(str)
                } else {
                    break
                }
            }

Return the accumulated result:

            return result
        }
    }

One last thing: we're using String? as the key type for words, but Optional doesn't conform to Hashable. Here's a quick extension that adds it when its wrapped type conforms:

    extension Optional: Hashable where Wrapped: Hashable {
        public var hashValue: Int {
            switch self {
            case let wrapped?: return wrapped.hashValue
            case .none: return 42
            }
        }
    }

Generating Input
That's the Markov chain itself, but it's pretty boring without some real text to put into it.

I decided to pull text from an RSS feed. What better feed to choose than my own blog's full text feed?

    let feedURL = URL(string: "https://www.mikeash.com/pyblog/rss.py?mode=fulltext")!

RSS is an XML format, so use XMLDocument to parse it:

    let xmlDocument = try! XMLDocument(contentsOf: feedURL, options: [])

The article bodies are in XML description nodes which are nested inside item nodes. An XPath query retrieves those:

    let descriptionNodes = try! xmlDocument.nodes(forXPath: "//item/description")

We want the strings in the XML nodes, so extract those and throw away any that are nil:

    let descriptionHTMLs = descriptionNodes.compactMap({ $0.stringValue })

We don't care about the markup at all. NSAttributedString can parse HTML and produce a string with attributes, which we can then throw away:

    let descriptionStrings = descriptionHTMLs.map({
        NSAttributedString(html: $0.data(using: .utf8)!, options: [:], documentAttributes: nil)!.string
    })

Let's take a quick detour to a function that breaks up a string into parts. We ultimately want to consume arrays of String, where each array represents a sentence. A string will contain zero or more sentences, so this wordSequences function returns an array of arrays of String:

    func wordSequences(in str: String) -> [[String]] {

Results get accumulated into a local variable:

        var result: [[String]] = []

Breaking a String into sentences isn't always easy. You could search for the appropriate punctuation, but consider a sentence like "Mr. Jock, TV quiz Ph.D., bags few lynx." That's one sentence, despite having four periods.

NSString provides some methods for intelligently examining parts of a string, and String gets those when you import Foundation. We'll ask str to enumerate its sentences, and let Foundation figure out how:

        str.enumerateSubstrings(in: str.startIndex..., options: .bySentences, { substring, substringRange, enclosingRange, stop in

We face a similar problem splitting each sentence into words. NSString does provide a method for enumerating over words, but this presents some problems, like losing punctuation. I ultimately decided to take a dumb approach for word splitting and just split on spaces. This means that you end up with words that contain punctuation as part of their string. This constrains the Markov chain more than if the punctuation was removed, but on the other hand it means that the output naturally contains something like reasonable punctuation. It seemed like a good tradeoff.

Some newlines make their way into the data set, so we'll cut those off at this point:

            let words = substring!.split(separator: " ").map({
                $0.trimmingCharacters(in: CharacterSet.newlines)
            })

The sliced-up sentence gets added to result:

            result.append(words)
        })

After enumeration is complete, result is filled out with the sentences from the input, and we return it to the caller:

        return result
    }

Back to the main code. Now that we have a way to convert a string into a list of sentences, we can build our Markov chain. We'll start with an empty Chain object:

    let chain = Chain()

Then we go through all the strings, extract the sentences, and add them to the chain:

    for str in descriptionStrings {
        for sentence in wordSequences(in: str) {
            chain.add(sentence)
        }
    }

All that's left is to generate some new sentences! We'll call generate() and then join the result with spaces. The output is hit-or-miss (which is no surprise given the random nature of the technique) so we'll generate a lot:

    for _ in 0 ..< 200 {
        print("\"" + chain.generate().joined(separator: " ") + "\"")
    }

Example Output
For your entertainment, here are some examples of the output of this program:

"We're ready to be small, weak references in New York City."
"It thus makes no values?"
"Simple JSON tasks, it's wasteful if you can be."
"Another problem, but it would make things more programming-related mystery goo."
"The escalating delays after excessive focus on Friday, September 29th."
"You may not set."
"Declare conformance to use = Self.init() to detect the requested values."
"The tagged pointer is inlined at this nature; even hundreds of software and writing out at 64 bits wide."
"We're ready to express that it works by reader ideas, so the decoding methods for great while, it's inaccessible to 0xa4, which takes care of increasing addresses as the timing."
"APIs which is mostly a one-sided use it yourself?"
"There's no surprise."
"I wasn't sure why I've been called 'zero-cost' in control both take serious effort to miss instead of ARC and games."
"For now, we can look at the filesystem."
"The intent is intended as reader-writer locks."
"For example, we can use of the code?"
"Swift's generics can all fields of Swift programming, with them is no parameters are static subscript, these instantiate self = cluster.reduce(0, +) / Double(cluster.count)"
"However, the common case, you to the left-hand side tables."

There's a lot of nonsense as well, so you have to dig through to find good ones, but Markov chains can produce some pretty funny output.

Conclusion
Markov chains have many practical uses, but they can also be hilariously useless when used to generate text. Aside from being entertaining, this code also demonstrates how to deal with circular references in a situation where there's no clear directionality, how to use NSString's intelligent enumeration methods to extract features from text, and a brief demonstration of the power of conditional conformances.

That wraps it up for today. Stop by next time for more fun, games, and maybe even a little education. Until then, Friday Q&A is driven by reader ideas, so if you have a topic you'd like to see covered here, please send it in!

Friday Q&A 2017-12-08: Type Erasure in Swift

Mike Ash — Fri, 15 Dec 2017 14:09:00 GMT

Friday Q&A 2017-12-08: Type Erasure in Swift

You might have heard the term type erasure. You might have even used type-erased types in the standard library, such as AnySequence. But what exactly is type erasure and how do you do it yourself? In this article, I'll explore type erasure, why you'd want it, and how to make it happen, a topic suggested by Lorenzo Boaro.

Motivation
There are times when you want to hide an underlying type from outside users. Sometimes it's just a matter of hiding implementation details. In other cases, it can prevent a static type from spreading through the codebase, or allow distinct types to interoperate. Type erasure is the process of removing a specific type annotation in favor of a more general one.

Protocols or abstract superclasses could be considered a really simple form of type erasure. Take NSString as an example. You never get a plain NSString instance; it's always an instance of some concrete subclass, usually private. That is mostly hidden from view, though, and the APIs all work with NSString. All of the various subclasses can be used without having to know what they are, and without having to sprinkle your code with their types.

More advanced techniques become useful when dealing with Swift's generics and protocols with associated types. Swift doesn't allow using such protocols as concrete types. For example, if you want to write some code that accepts any Sequence of Int values, you can't write this:

    func f(seq: Sequence<Int>) { ...

That's not legal Swift. You can specialize generic types that way, but not protocols. You can work around this using generics:

    func f<S: Sequence>(seq: S) where S.Element == Int { ...

Sometimes this works great, but there are cases where it can be troublesome. Often you can't just add generics in one spot: one generic function requires others to be generic which require yet more.... Even worse, you can't use this for return values or properties at all. This won't work the way you want it to at all:

    func g<S: Sequence>() -> S where S.Element == Int { ...

We're looking for something where g can return any conforming type, but instead this allows the caller to choose which type it wants, and g is then required to provide an appropriate value.

Swift provides the AnySequence type to solve this problem. AnySequence wraps an arbitrary Sequence and erases its type, providing access to it through the AnySequence type instead. Using this, we can rewrite f and g:

    func f(seq: AnySequence<Int>) { ...

    func g() -> AnySequence<Int> { ...

The generics disappear and all the specific types are still hidden. There's a small code complexity and runtime cost from having to wrap the values in AnySequence, but the code is nice and clean.

The Swift standard library has a bunch of these Any types, such as AnyCollection, AnyHashable, and AnyIndex. It can be useful to create your own to go along with your own generics and protocols, or just use the techniques to simplify your code when dealing with them. Let's explore the various ways to accomplish type erasure.

Type Erasure With Classes
We need to wrap up some common functionality from multiple types without exposing those types. This sounds a lot like a superclass-subclass relationship, and in fact we can use subclasses to implement type erasure. The superclass can expose an API that's blind to the underlying implementation type, and a subclass can implement that API with knowledge of the underlying type.

Let's see how our own version of AnySequence would look using this technique. I'll call it MAnySequence to incorporate my name:

    class MAnySequence<Element>: Sequence {

This class is also going to need an iterator type that it can return from the makeIterator method. We have to perform type erasure twice so that we can hide the underlying Sequence type as well as its Iterator type. This inner Iterator class conforms to IteratorProtocol and implements its next method to call fatalError. Swift doesn't have built-in support for abstract classes, so this will have to suffice:

        class Iterator: IteratorProtocol {
            func next() -> Element? {
                fatalError("Must override next()")
            }
        }

MAnySequence gets a similar implementation of makeIterator. It calls fatalError to encourage its subclass to override it:

        func makeIterator() -> Iterator {
            fatalError("Must override makeIterator()")
        }
    }

That is the type-erased public API. The private implementation subclasses it. The public class is parameterized by the element type, but the private implementation class is parameterized by the sequence type it wraps:

    private class MAnySequenceImpl<Seq: Sequence>: MAnySequence<Seq.Element> {

This class needs an internal subclass of the internal Iterator class from above:

        class IteratorImpl: Iterator {

It wraps an instance of the sequence's Iterator type:

            var wrapped: Seq.Iterator

            init(_ wrapped: Seq.Iterator) {
                self.wrapped = wrapped
            }

It implements next to call through to that wrapped iterator:

            override func next() -> Seq.Element? {
                return wrapped.next()
            }
        }

Similarly, MAnySequenceImpl wraps an instance of the sequence:

        var seq: Seq

        init(_ seq: Seq) {
            self.seq = seq
        }

It implements makeIterator to get an iterator from wrapped sequence, and then wrap that iterator in IteratorImpl:

        override func makeIterator() -> IteratorImpl {
            return IteratorImpl(seq.makeIterator())
        }

    }

We need a way to actually create these things. A static method on MAnySequence creates an instance of MAnySequenceImpl and returns it to the caller as an MAnySequence:

    extension MAnySequence {
        static func make<Seq: Sequence>(_ seq: Seq) -> MAnySequence<Element> where Seq.Element == Element {
            return MAnySequenceImpl<Seq>(seq)
        }
    }

In production code, we would probably want to clean this up a bit by using an extra level of indirection so that MAnySequence could provide an initializer instead.

Let's try it out:

    func printInts(_ seq: MAnySequence<Int>) {
        for elt in seq {
            print(elt)
        }
    }

    let array = [1, 2, 3, 4, 5]
    printInts(MAnySequence.make(array))
    printInts(MAnySequence.make(array[1 ..< 4]))

It works!

Type Erasure With Functions
We want to expose functionality from multiple types without exposing those types. A natural approach for this is to store functions whose signatures only involve the types we want to expose. The function bodies can be created in a context where the underlying implementation types are known.

Let's look at how MAnySequence would look with this approach. It starts off similar to the previous implementation, although this one can be a struct rather than a class because it's just a dumb container and there's no inheritance:

    struct MAnySequence<Element>: Sequence {

Like before, it needs an Iterator that it can return. This one is also a struct and it contains a stored property which is a function that takes no parameters and returns an Element?, which is the signature used for the next method in IteratorProtocol. It then implement IteratorProtocol to call that function:

        struct Iterator: IteratorProtocol {
            let _next: () -> Element?

            func next() -> Element? {
                return _next()
            }
        }

MAnySequence itself is similar: it contains a stored property which is a function that takes no arguments and returns an Iterator. Sequence is then implemented by calling through to that function:

        let _makeIterator: () -> Iterator

        func makeIterator() -> Iterator {
            return _makeIterator()
        }

MAnySequence's init is where the magic happens. It takes an arbitrary Sequence as its parameter:

        init<Seq: Sequence>(_ seq: Seq) where Seq.Element == Element {

It then needs to wrap the functionality of this sequence in a function:

            _makeIterator = {

How do we make an iterator here? We'll start by asking seq to make one:

                var iterator = seq.makeIterator()

Then we'll wrap that iterator in Iterator. Its _next function can just call iterator's next method:

                return Iterator(_next: { iterator.next() })
            }
        }

    }

Here's some code that uses it:

    func printInts(_ seq: MAnySequence<Int>) {
        for elt in seq {
            print(elt)
        }
    }

    let array = [1, 2, 3, 4, 5]
    printInts(MAnySequence(array))
    printInts(MAnySequence(array[1 ..< 4]))

This one works too!

This function-based approach to type erasure can be particularly nice when you need to wrap a small amount of functionality as part of a larger type, and don't need separate classes implementing the entire functionality of whatever types you're erasing.

For example, let's say you want to write some code that works with various collection types, but all it really needs to be able to do with those collections is get a count and do a zero-based integer subscript. For example, this might be a table view data source. It might then look like this:

    class GenericDataSource<Element> {
        let count: () -> Int
        let getElement: (Int) -> Element

        init<C: Collection>(_ c: C) where C.Element == Element, C.Index == Int {
            count = { c.count }
            getElement = { c[$0 - c.startIndex] }
        }
    }

Then the rest of the code in GenericDataSource can easily call count() and getElement() to perform operations on that passed-in collection, without that collection type contaminating GenericDataSource's generic parameters.

Conclusion
Type erasure is a useful technique for stopping the viral spread of generics in your code, or just keeping interfaces simple. It's accomplished by wrapping the underlying type in a way which separates the API from the functionality. This can be done with an abstract public superclass and a private subclass, or it can be done by wrapping the API in functions. Type erasure with functions is particularly useful for simple cases where you only need a few pieces of functionality.

The Swift standard library provides several type erased types that you can take advantage of. For example, AnySequence wraps a Sequence, as the name indicates, and lets you iterate over a sequence without needing to know its type. AnyIterator is the companion to this type, providing a type-erased iterator. AnyHashable provides type-erased access to Hashable types. There are a few more for the various collection protocols. Search the documentation for Any to see those. The standard library also uses type erasure as part of the Codable API: KeyedEncodingContainer and KeyedDecodingContainer are type-erased wrappers around the corresponding container protocols, and are used to allow Encoder and Decoder implementations to provide containers without having to incorporate the container types into the API.

That's it for today! Come back next time for more programming fun and games. Friday Q&A is driven by reader suggestions, so if you have a topic you'd like to see me cover here, please send it in!

Friday Q&A 2017-11-10: Observing the A11's Heterogenous Cores

Mike Ash — Fri, 10 Nov 2017 12:41:00 GMT

Friday Q&A 2017-11-10: Observing the A11's Heterogenous Cores

Apple's newest mobile CPU, the A11, brings a new level of heterogeneous computing to iOS, with both high and low performance cores that are always on. With the release of the iPhone X, I set out to see if I could observe these heterogeneous cores in action.

(Yes, I'm aware that A11 devices could be obtained weeks ago when the iPhone 8 came out, but I didn't know anybody who got one, and it was hard to work up much excitement for it with the iPhone X coming not long after.)

Brief Review
Multicore CPUs have been around in the Apple world since at least the Power Mac G5, which was available with up to two cores per CPU, and up to two CPUs in one machine.

They've become the norm in many parts of the computing world. They're a natural response to increasing transistor counts as silicon chip fabrication technology continues its asymptotic march toward infinity. CPU designers always want to use more transistors to make their hardware faster, but there are diminishing returns. Rather than put more transistors into speeding up single-threaded performance, those transistors can be used to effectively put multiple CPUs onto a single chip. Those became known as CPU cores.

These days you can buy CPUs with dozens or even hundreds of cores. That's often not the best tradeoff, since a lot of software won't take advantage of that many. It can be better to have fewer, faster cores instead. These days, typical user-facing computers have somewhere in the neighborhood of between two and 16 cores.

Usually, all of the cores in a system are identical. Software can run on any or all of them and it doesn't make a bit of difference. If you dig deeply enough, some CPUs have sets of cores which can transfer data within the group more quickly than outside the group. It thus makes sense to put multiple threads working on the same data together within such a group. This is one of the reasons for the thread affinity API. Even so, the individual cores are still the same, they just aren't connected 100% symmetrically.

Last year, Apple introduced their A10 CPU with heterogeneous cores. It's a four-core CPU, but those cores are not identical. Instead, it has two high-performance cores and two high-efficiency cores. The high-efficiency cores are slower, but consume much less power. For tasks that don't need to be completed as quickly as possible, running on the high-efficiency cores makes them use much less power. The system would switch between running software on the high-performance cores or the high-efficiency cores depending on the workload at any given time.

This is a great idea, since iPhones are battery-powered and really want to use as little power as possible, and a lot of the work that iPhones do is relatively mundane tasks that don't need to be super fast, like downloading the latest tweets from your stream or loading the next chunk of audio data from the flash storage. It's a bit wasteful, though, since you have two cores just sitting there doing nothing at any given time. That's what you want in high-efficiency mode, since the whole idea is to run less hardware in order to consume less power, but in high-performance mode it's unfortunate that it can't take advantage of the two idle high-efficiency cores.

This year, Apple introduced the A11 which takes the concept a step further. It has six cores: two high-performance and four high-efficiency. And unlike the A10, the A11 is able to use all six cores simultaneously. If the workload requires it, the A11 can run two threads on the high-performance cores while at the same time running four more threads on the high-efficiency cores.

Planning
I started thinking about how we could catch it in the act. The system probably moves threads around regularly, so timing a long-running computation probably wouldn't reveal much. Short-running computations would be hard to manage, since we'd want to ensure there were exactly six going simultaneously.

I decided to write a test that would do a lot of small computations on each thread. It would then sample the timing of a few of those small computations during the process. Hopefully they would happen quickly enough that they would be unlikely to migrate between cores during the sample.

The next question was what sort of computation to perform. I started out doing a SHA-256 of the current iteration count, but I was afraid that special cryptographic instructions might interfere with the results. I then tried a simple square root algorithm on the current iteration count. I thought this might be placing too much emphasis on floating-point performance, so I finally redid it to do an integer square root instead. Ultimately, all three gave the same basic results. I stuck with the integer square root since integer computations seem like the predominant workload in most software.

My theory was that this should show a strongly bimodal distribution of running times on the A11. Did it work? Read below to find out!

Code
Each thread runs a function which takes a number of iterations to perform, and a sampling interval. It returns the runtimes for each sample in an array, expressed in terms of the units provided by the mach_absolute_time call:

    func ThreadFunc(iterations: UInt64, interval: UInt64) -> [UInt64] {

It creates an array that will eventually hold all of the sampled running times:

        var times: [UInt64] = []

Then it enters a loop for the given number of iterations:

        for i in 1 ... iterations {

Before it does any work, it grabs the current time. It does this regardless of whether or not this is a run to sample, in an attempt to make the non-sampled iterations as similar as possible to the sampled iterations:

            let start = mach_absolute_time()

iterations is a UInt64 but we want to work on Int64 numbers, so convert it and stash it in a variable:

            let x = Int64(i)

I implemented the Babylonian method for computing a square root. This consists of making a guess at the square root, then iteratively refining that guess by computing the average of guess and x / guess. Iterate until the desired precision is reached. It's not a very fast method, but we don't care about speed here, other than for consistency. I implemented this algorithm to run 1024 iterations, which is way too many for any sort of reasonable result, but it provides a nice amount of work for our benchmarking purposes:

            var guess = x
            for _ in 0 ... 1024 {
                guess = (guess + x / guess) / 2
            }

I had to make sure that the compiler would actually perform this computation and not throw away the whole thing as unnecessary. That meant I had to use the result somehow. I added a dummy check to see if the computed square root was way off from the actual one, with a print (which can't be optimized away) in that case:

            if abs(guess * guess - x) > 1000000000 {
                print("Found a really inexact square root! \(guess * guess) \(x)")
            }

None of my actual runs ever hit the print, so there was no IO to skew the timing.

With the work completed, it gets the current time again:

            let end = mach_absolute_time()

If this is a sampling iteration, add the total runtime for this iteration to the times array:

            if i % interval == 0 {
                times.append(end - start)
            }
        }

Once all of the iterations are complete, return the times:

        return times
    }

That's the code for a single thread. We also need code to spawn these threads and analyze the results. That code starts with some constants for the number of threads to spawn, the number of iterations to run, and the sampling interval:

    let threadCount = 6
    let iterations: UInt64 = 1000000
    let interval = iterations / 20

It makes an array in which to gather all of the sampled times:

    var times: [UInt64] = []

It will use an NSCondition object to synchronize access to times and wait for results to come in:

    let cond = NSCondition()

We'll track the number of active threads so we can know when they've all completed:

    var runningThreads = threadCount

With the initial setup complete, it starts spawning threads:

    for _ in 0 ..< threadCount {
        Thread.detachNewThread({

The first thing each thread does is call ThreadFunc to do the work and gather results:

            let oneTimes = ThreadFunc(iterations: iterations, interval: interval)

Once the results come back, it appends them to times and signals that this thread has completed:

            cond.lock()
            times.append(contentsOf: oneTimes)
            runningThreads -= 1
            cond.signal()
            cond.unlock()
        })
    }

Back in the controlling code, it waits for all of the running threads to complete:

    cond.lock()
    while runningThreads > 0 {
        cond.wait()
    }
    cond.unlock()

At this point, it has all samples in the times array. Those samples are in terms of the units returned by mach_absolute_time, which aren't all that useful on their own, although their relative values are still instructive. We'll convert them to nanoseconds:

    let nanoseconds = times.map({ machToNanoseconds($0) })

Next, it runs a really simple clustering algorithm, which just steps through the samples and looks for gaps where the the relative difference between two samples is greater than some threshold. I wasn't sure which threshold value would be appropriate, so I had it try a bunch:

    for threshold in [0.01, 0.02, 0.03, 0.04, 0.05, 0.1] {
        print("Threshold: \(threshold)")
        let clusters = cluster(nanoseconds, threshold: threshold)

This returns each cluster as an array of values within the cluster. The code then computes the mean, median, and standard deviation for each cluster ard prints them out:

        for cluster in clusters {
            let mean = cluster.reduce(0, +) / Double(cluster.count)
            let median = cluster[cluster.count / 2]
            let stddev = sqrt(cluster.map({ ($0 - mean) * ($0 - mean) }).reduce(0, +) / Double(cluster.count))

            print("count: \(cluster.count) - mean: \(mean) - median: \(median) - stddev: \(stddev)")
        }
        print("----------")
    }

That's it! We're ready to see the results.

Results
I first ran it in on my iPhone 6+ to generate something of a baseline. The threshold of 0.05 seemed to provide the best clustering. Here are those results:

    Threshold: 0.05
    count: 120 - mean: 10993.4027777778 - median: 10958.3333333333 - stddev: 75.1148490502343

Each sample takes almost the same amount of time. They're around 11 microseconds, with a standard deviation of only 75 nanoseconds.

Here are the results from the iPhone X:

    Threshold: 0.05
    count: 54 - mean: 6969.90740740741 - median: 6958.33333333333 - stddev: 24.6068190109599
    count: 65 - mean: 9082.69230769231 - median: 9250.0 - stddev: 278.358695652034
    count: 1 - mean: 14125.0 - median: 14125.0 - stddev: 0.0

There's one outlier, which which shows up pretty consistently across multiple runs. I'm not entirely sure why it would be so consistent. Maybe it takes a moment to ramp the CPU up to full speed? Ignoring the outlier, we see the heterogeneous cores clearly. There's one narrow cluster centered around ~7 microseconds, and another narrow cluster centered around ~9 microseconds, and nothing in between.

The speed difference is smaller than I expected, but in my experiments it varied quite a bit depending on the type of work being done. This particular microbenchmark is probably bottlenecked on integer division, which is not the most representative task.

Regardless, the signal is clear, with one chunk of samples running significantly faster than the other chunk, illustrating the high-performance and high-efficiency cores working simultaneously.

Conclusion
It's been interesting to follow the development of Apple's CPUs, and the heterogeneous cores in their latest iteration are really nifty. I expected it to take some work to observe them, but it ended up being straightforward. By running a long sequence of quick computations on multiple threads and sampling a few of them, the disparate cores become obvious.

That's it for today! Friday Q&A will be back next time with more fun and games. In the meantime, if you have a topic you'd like to see covered here, please send it in!

Friday Q&A 2017-10-27: Locks, Thread Safety, and Swift: 2017 Edition

Mike Ash — Fri, 27 Oct 2017 11:28:00 GMT

Friday Q&A 2017-10-27: Locks, Thread Safety, and Swift: 2017 Edition

Back in the dark ages of Swift 1, I wrote an article about locks and thread safety in Swift. The march of time has made it fairly obsolete, and reader Seth Willits suggested I update it for the modern age, so here it is!

This article will repeat some material from the old one, with changes to bring it up to date, and some discussion of how things have changed. Reading the previous article is not necessary before you read this one.

A Quick Recap on Locks
A lock, or mutex, is a construct that ensures only one thread is active in a given region of code at any time. They're typically used to ensure that multiple threads accessing a mutable data structure all see a consistent view of it. There are several kinds of locks:

Blocking locks sleep a thread while it waits for another thread to release the lock. This is the usual behavior.
Spinlocks use a busy loop to constantly check to see if a lock has been released. This is more efficient if waiting is rare, but wastes CPU time if waiting is common.
Reader/writer locks allow multiple "reader" threads to enter a region simultaneously, but exclude all other threads (including readers) when a "writer" thread acquires the lock. This can be useful as many data structures are safe to read from multiple threads simultaneously, but unsafe to write while other threads are either reading or writing.
Recursive locks allow a single thread to acquire the same lock multiple times. Non-recursive locks can deadlock, crash, or otherwise misbehave when re-entered from the same thread.

APIs
Apple's APIs have a bunch of different mutex facilities. This is a long but not exhaustive list:

pthread_mutex_t.
pthread_rwlock_t.
DispatchQueue.
OperationQueue when configured to be serial.
NSLock.
os_unfair_lock.

In addition to this, Objective-C provides the @synchronized language construct, which at the moment is implemented on top of pthread_mutex_t. Unlike the others, @synchronized doesn't use an explicit lock object, but rather treats an arbitrary Objective-C object as if it were a lock. A @synchronized(someObject) section will block access to any other @synchronized sections that use the same object pointer. These different facilities all have different behaviors and capabilities:

pthread_mutex_t is a blocking lock that can optionally be configured as a recursive lock.
pthread_rwlock_t is a blocking reader/writer lock.
DispatchQueue can be used as a blocking lock. It can be used as a reader/writer lock by configuring it as a concurrent queue and using barrier blocks. It also supports asynchronous execution of the locked region.
OperationQueue can be used as a blocking lock. Like dispatch_queue_t it supports asynchronous execution of the locked region.
NSLock is blocking lock as an Objective-C class. Its companion class NSRecursiveLock is a recursive lock, as the name indicates.
os_unfair_lock is a less sophisticated, lower-level blocking lock.

Finally, @synchronized is a blocking recursive lock.

Spinlocks, Lack of
I mentioned spinlocks as one type of lock, but none of the APIs listed here are spinlocks. This is a big change from the previous article, and is the main reason I'm writing this update.

Spinlocks are really simple, and are efficient in the right circumstances. Unfortunately, they're a little too simple for the complexities of the modern world.

The problem is thread priorities. When there are more runnable threads than CPU cores, higher priority threads get preference. This is a useful notion, because CPU cores are always a limited resource, and you don't want some time-insensitive background network operation stealing time from your UI while the user is trying to use it.

When a high-priority thread gets stuck and has to wait for a low-priority thread to finish some work, but the high-priority thread prevents the low-priority thread from actually performing that work, it can result in long hangs or even a permanent deadlock.

The deadlock scenario looks like this, where H is a high-priority thread and L is a low-priority thread:

L acquires the spinlock.
L starts doing some work.
H becomes ready to run, and preempts L.
H attempts to acquire the spinlock, but fails, because L still holds it.
H begins angrily spinning on the spinlock, repeatedly trying to acquire it, and monopolizing the CPU.
H can't proceed until L finishes its work. L can't finish its work unless H stops angrily spinning on the spinlock.
Sadness.

There are ways to solve this problem. For example, H might donate its priority to L in step 4, allowing L to complete its work in a timely fashion. It's possible to make a spinlock that solves this problem, but Apple's old spinlock API, OSSpinLock, doesn't.

This was fine for a long time, because thread priorities didn't get much use on Apple's platforms, and the priority system used dynamic priorities that kept the deadlock scenario from persisting too long. More recently, quality of service classes made different priorities more common, and made the deadlock scenario more likely to persist.

OSSpinLock, which did a fine job for so long, stopped being a good idea with the release of iOS 8 and macOS 10.10. It's now been formally deprecated. The replacement is os_unfair_lock, which fills the same overall purpose as a low-level, unsophisticated, cheap lock, but is sufficiently sophisticated to avoid problems with priorities.

Value Types
Note that pthread_mutex_t, pthread_rwlock_t, and os_unfair_lock are value types, not reference types. That means that if you use = on them, you make a copy. This is important, because these types can't be copied! If you copy one of the pthread types, the copy will be unusable and may crash when you try to use it. The pthread functions that work with these types assume that the values are at the same memory addresses as where they were initialized, and putting them somewhere else afterwards is a bad idea. os_unfair_lock won't crash, but you get a completely separate lock out of it which is never what you want.

If you use these types, you must be careful never to copy them, whether explicitly with a = operator, or implicitly by, for example, embedding them in a struct or capturing them in a closure.

Additionally, since locks are inherently mutable objects, this means you need to declare them with var instead of let.

The others are reference types, meaning they can be passed around at will, and can be declared with let.

Initialization
You must be careful with the pthread locks, because you can create a value using the empty () initializer, but that value won't be a valid lock. These locks must be separately initialized using pthread_mutex_init or pthread_rwlock_init:

    var mutex = pthread_mutex_t()
    pthread_mutex_init(&mutex, nil)

It's tempting to write an extension on these types which wraps up the initialization. However, there's no guarantee that initializers work on the variable directly, rather than on a copy. Since these types can't be safely copied, such an extension can't be safely written unless you have it return a pointer or a wrapper class.

If you use these APIs, don't forget to call the corresponding destroy function when it's time to dispose of the lock.

Use
DispatchQueue has a callback-based API which makes it natural to use it safely. Depending on whether you need the protected code to run synchronously or asynchronously, call sync or async and pass it the code to run:

    queue.sync(execute: { ... })
    queue.async(execute: { ... })

For the sync case, the API is nice enough to capture the return value from the protected code and provide it as the return value of the sync method:

    let value = queue.sync(execute: { return self.protectedProperty })

You can even throw errors inside the protected block and they'll propagate out.

OperationQueue is similar, although it doesn't have a built-in way to propogate return values or errors. You'll have to build that yourself, or use DispatchQueue instead.

The other APIs require separate locking and unlocking calls, which can be exciting when you forget one of them. The calls look like this:

    pthread_mutex_lock(&mutex)
    ...
    pthread_mutex_unlock(&mutex)

    nslock.lock()
    ...
    nslock.unlock()

    os_unfair_lock_lock(&lock)
    ...
    os_unfair_lock_unlock(&lock)

Since the APIs are virtually identical, I'll use nslock for further examples. The others are the same, but with different names.

When the protected code is simple, this works well. But what if it's more complicated? For example:

    nslock.lock()
    if earlyExitCondition {
        return nil
    }
    let value = compute()
    nslock.unlock()
    return value

Oops, sometimes you don't unlock the lock! This is a good way to make hard-to-find bugs. Maybe you're always disciplined with your return statements and never do this. What if you throw an error?

    nslock.lock()
    guard something else { throw error }
    let value = compute()
    nslock.unlock()
    return value

Same problem! Maybe you're really disciplined and would never do this either. Then you're safe, but even then the code is a bit ugly:

    nslock.lock()
    let value = compute()
    nslock.unlock()
    return value

The obvious fix for this is to use Swift's defer mechanism. The moment you lock, defer the unlock. Then no matter how you exit the code, the lock will be released:

    nslock.lock()
    defer { nslock.unlock() }
    return compute()

This works for early returns, throwing errors, or just normal code.

It's still annoying to have to write two lines, so we can wrap everything up in a callback-based function like DispatchQueue has:

    func withLocked<T>(_ lock: NSLock, _ f: () throws -> T) rethrows -> T {
        lock.lock()
        defer { lock.unlock() }
        return try f()
    }

    let value = withLocked(lock, { return self.protectedProperty })

When implementing this for value types, you'll need to be sure to take a pointer to the lock rather than the lock itself. Remember, you don't want to copy these things! The pthread version would look like this:

    func withLocked<T>(_ mutexPtr: UnsafeMutablePointer<pthread_mutex_t>, _ f: () throws -> T) rethrows -> T {
        pthread_mutex_lock(mutexPtr)
        defer { pthread_mutex_unlock(mutexPtr) }
        return try f()
    }

    let value = withLocked(&mutex, { return self.protectedProperty })

Choosing Your Lock API
DispatchQueue is an obvious favorite. It has a nice Swifty API and is pleasant to use. The Dispatch library gets a huge amount of attention from Apple, and that means that it can be counted on to perform well, work reliably, and get lots of cool new features.

DispatchQueue allows for a lot of nifty advanced uses, such as scheduling timers or event sources to fire directly on the queue you're using as a lock, ensuring that the handlers are synchronized with other things using the queue. The ability to set target queues allows expressing complex lock hierarchies. Custom concurrent queues can be easily used as reader-writer locks. You only have to change a single letter to execute protected code asynchronously on a background thread rather than synchronously. And the API is easy to use and hard to misuse. It's a win all around. There's a reason GCD quickly became one of my favorite APIs, and remains one to this day.

Like most things, it's not perfect. A dispatch queue is represented by an object in memory, so there's a bit of overhead. They're missing some niche features, like condition variables or recursiveness. Every once in a great while, it's useful to be able to make individual lock and unlock calls rather than be forced to use a callback-based API. DispatchQueue is usually the right choice, and is a great default if you don't know what to pick, but there are occasionally reasons to use others.

os_unfair_lock can be a good choice when per-lock overhead is important (because for some reason you have a huge number of them) and you don't need fancy features. It's implemented as a single 32-bit integer which you can place wherever you need it, so overhead is small.

As the name hints, one of the features that os_unfair_lock is missing is fairness. Lock fairness means that there's at least some attempt to ensure that different threads waiting on a lock all get a chance to acquire it. Without fairness, it's possible for a thread that rapidly releases and re-acquires the lock to monopolize it while other threads are waiting.

Whether or not this is a problem depends on what you're doing. There are some use cases where fairness is necessary, and some where it doesn't matter at all. The lack of fairness allows os_unfair_lock to have better performance, so it can provide an edge in cases where fairness isn't needed.

pthread_mutex is somewhere in the middle. It's considerably larger than os_unfair_lock, at 64 bytes, but you can still control where it's stored. It implements fairness, although this is a detail of Apple's implementation, not part of the API spec. It also provides various other advanced features, such as the ability to make the mutex recursive, and fancy thread priority stuff.

pthread_rwlock provides a reader/writer lock. It takes up a whopping 200 bytes and doesn't provide much in the way of interesting features, so there doesn't seem to be much reason to use it over a concurrent DispatchQueue.

NSLock is a wrapper around pthread_mutex. It's hard to come up with a use case for this, but it could be useful if you need explicit lock/unlock calls but don't want the hassle of manually initializing and destroying a pthread_mutex.

OperationQueue offers callback-based API like DispatchQueue, with some advanced features for things like dependency management between operations, but without many of the other features offered by DispatchQueue. There is little reason to use OperationQueue as a locking API, although it can be useful for other things.

In short: DispatchQueue is probably the right choice. In certain circumstances, os_unfair_lock may be better. The others are usually not the ones to use.

Conclusion
Swift has no language facilities for thread synchronization, but the APIs make up for it. GCD remains one of Apple's crown jewels, and the Swift API for it is great. For the rare occasions where it's not suitable, there are many other options to choose from. We don't have @synchronized or atomic properties, but we have things that are better.

That wraps it up for this time. Check back again for more fun stuff. If you get bored in the meantime, buy one of my books! Friday Q&A is driven by reader ideas, so if you have a topic you'd like to see covered here, please send it in!

The Complete Friday Q&A Volumes II and III Are Out!

Mike Ash — Tue, 10 Oct 2017 16:19:00 GMT

The Complete Friday Q&A Volumes II and III Are Out!

It's finally here! I'm pleased to present The Complete Friday Q&A Volumes II and III.

These collect my blog posts from November 2010 through 2016. As with Volume I, they are available in both digital and print versions.

Click here to see my store for the digital versions.

For the print versions, or for links to the digital versions on iBooks (and eventually Kindle, but that will take some time), visit my book page.

When purchasing directly from me, you'll receive $10 off each additional volume in a multi-volume purchase. If you already own Volume I and would like to complete the set, e-mail me with your receipt to get a coupon for $10 off.

If you would like both the print and digital versions, place an order for the print versions, then e-mail me with your receipt to get a coupon for $20 off the digital copies.

It has taken me far too long, but I'm so glad to finally have these out. Many thanks to my guest authors, reviewers, and readers.

Friday Q&A 2017-10-06: Type-Safe User Defaults

Mike Ash — Fri, 06 Oct 2017 12:55:00 GMT

Friday Q&A 2017-10-06: Type-Safe User Defaults

It's fun to re-imagine traditional techniques with a Swift twist. I've implemented a type-safe layer on top of the venerable NSUserDefaults, and I'm going to discuss my little library today. Credit/blame for this idea goes to local reader José Vazquez, although he inspired it by accident while talking about something else.

User Defaults
NSUserDefaults, or just UserDefaults in Swift, is a typical dynamically-typed string-oriented Objective-C API. It stores string keys and property list values. This is perfectly fine, but I wanted to do better. I came up with this wishlist:

Keys should be declared, not written ad hoc at the point of use. You can do this with UserDefaults by declaring string constants, but it's easy to get lazy and not do it.
There should be no repetition in the common case. A key's string should automatically be made to match its identifier in the code.
No casting should be required. Keys should have a value type associated with them and the conversion handled internally.
It should interoperate smoothly with values read and written directly through UserDefaults.
Non-plist value types should be supported through Codable.
Default values should be specified as part of the key rather than registered separately.
The value should be made available as a property so that it can be the target of mutating methods and operators.

I managed to hit all of these points, although the implementation is somewhat gnarly.

Code
As usual, the code is available on GitHub if you want to play with it or see it all in one place:

https://github.com/mikeash/TSUD

Example Use
Before we get into the implementation, let's take at what it looks like to use this API. That will inform the implementation choices and make it clearer why things work the way they do.

To declare a key, write a struct conforming to the TSUD protocol. Inside, implement a single static property called defaultValue which contains the value to be returned if UserDefaults doesn't contain a value:

    struct fontSize: TSUD {
        static let defaultValue = 12.0
    }

To read or write the value, use the value property on the struct:

    let font = NSFont.systemFont(ofSize: fontSize.value)
    fontSize.value = 14.0

Since value is just a property, you can do disturbing and unnatural things like += to it.

    fontSize.value += 5.0

If you want to be able to detect the lack of a value and handle it specially rather than getting a default value, declare defaultValue to be optional and set it to nil:

    struct username: TSUD {
        static let defaultValue: String? = nil
    }

Then use it like any other optional:

    if let username = username.value {
        field.string = username
    } else {
        username.value = promptForUsername()
    }

By default, TSUD types correspond to a UserDefaults key matching their type name. These examples would be stored under "fontSize" and "username". If you need to override this (for example, because you want to access a key that has a space in it, or you don't like the key's capitalization in your code), implement the stringKey property:

    struct hasWidgets: TSUD {
        static let defaultValue = false
        static let stringKey = "Has Widgets"
    }

Arbitrary Codable types are supported. They are encoded as property list objects:

    struct Person: Codable {
        var name: String
        var quest: String
        var age: Int
    }

    struct testPerson: TSUD {
        static let defaultValue: Person? = nil
    }

If you prefer, you can also use methods to get and set the value:

    if hasWidgets.get() {
        hasWidgets.set(false)
    }

These methods allow you to specify the UserDefaults object to work with, in the unlikely event that you want to work with something other than UserDefaults.standard:

    let otherDefaults = UserDefaults(suiteName: "...")!
    if hasWidgets.get(otherDefaults) {
        // That other thing has widgets!
    }

If you want to access the value in another UserDefaults instance as a mutable value, there's a subscript which takes a UserDefaults instance and provides the value. Unfortunately, Swift doesn't allow static subscripts, so you have to instantiate the key type:

    fontSize()[otherDefaults] += 10.0

Implementation
We'll start with the protocol itself. It contains an associatedtype for the value type, an empty init method so the type can be instantiated, and the two properties discussed above:

    public protocol TSUD {
        associatedtype ValueType: Codable

        init()

        static var defaultValue: ValueType { get }

        static var stringKey: String { get }
    }

We'll provide a default implementation for stringKey that uses the type name:

    public extension TSUD {
        static var stringKey: String {
            let s = String(describing: Self.self)

In certain circumstances, Swift adds a little numeric tag at the end like "hasWidgets #1" If this name contains one, we strip it off. Otherwise we return the name directly:

            if let index = s.index(of: " ") {
                return String(s[..<index])
            } else {
                return s
            }
        }
    }

Later on, we'll need the ability to detect whether a value of ValueType is an Optional set to nil. This is not as easy as checking it with == nil, because it needs to work with arbitrary types wrapped in an Optional. The best technique I could find was to create a small protocol with an isNil property, and make Optional conform to it. Code that wants to check for nil can attempt to cast to the protocol, check isNil on success, and assume false on failure:

    private protocol OptionalP {
        var isNil: Bool { get }
    }

    extension Optional: OptionalP {
        var isNil: Bool { return self == nil }
    }

The main API of TSUD is implemented in an extension:

    extension TSUD {

Getting and setting the value is implemented with subscripting. This calls helper methods to encode and decode the value:

        public subscript(nsud: UserDefaults) -> ValueType {
            get {
                return decode(nsud.object(forKey: Self.stringKey)) ?? Self.defaultValue
            }
            nonmutating set {
                nsud.set(encode(newValue), forKey: Self.stringKey)
            }
        }

The get and set methods call through to the subscript. Because these are static methods and we can't have a static subscript, these instantiate self using the empty init() in the protocol:

        public static func get(_ nsud: UserDefaults = .standard) -> ValueType {
            return self.init()[nsud]
        }

        public static func set(_ value: ValueType, _ nsud: UserDefaults = .standard) {
            self.init()[nsud] = value
        }

value is a computed property that calls get and set:

        public static var value: ValueType {
            get {
                return get()
            }
            set {
                set(newValue)
            }
        }

The encode method takes care of transforming a value to a property list object suitable for UserDefaults:

        private func encode(_ value: ValueType) -> Any? {

There are some special cases to handle. First, if ValueType is an optional and value contains nil, we return nil:

            switch value {
            case let value as OptionalP where value.isNil: return nil

Date and Data are returned unchanged. For some reason, Date's implementation of Codable encodes its value as a raw number, even though property lists support Date values natively. Likewise, Data encodes its value as an array of numbers even though Data is a valid property list type. We get past that by avoiding any encoding for these types. It's possible that more property list types need this treatment, in which case it's easy to add them here:

            case is Date: return value
            case is Data: return value

All other values get encoded. We use PropertyListEncoder to encode the value:

            default:
                let data = try? PropertyListEncoder().encode([value])

If this fails, we'll return nil:

                guard let dataUnwrapped = data else { return nil }

You'll notice that we're not encoding the value directly, but rather an array containing the value. For some reason, PropertyListEncoder does not support numbers or dates as top-level objects. Wrapping them in an array convinces it to work. We'll extract it back out afterwards.

Unfortunately, there's a bit of an impedence mismatch here. PropertyListEncoder produces binary data in property list format, but we want property list objects which can be passed to UserDefaults. We'll turn that data back into objects by using PropertyListSerialization:

                let wrappedPlist = (try? PropertyListSerialization.propertyList(from: dataUnwrapped, options: [], format: nil)) as? [Any]

This is really ugly. It's unfortunate that PropertyListEncoder doesn't have an option to skip the serialization step and provide objects directly. We could make our own, but that's more ambitious than I was willing to get for this.

Once we have the property list object, we can index into the wrapper array to fetch the object we actually want:

                return wrappedPlist?[0]
            }
        }

            let data = try? PropertyListEncoder().encode([value])
            guard let dataUnwrapped = data else { return nil }
            let wrappedPlist = (try? PropertyListSerialization.propertyList(from: dataUnwrapped, options: [], format: nil)) as? [Any]
            return wrappedPlist?[0]

Decoding is the same thing in reverse. The decode method takes an optional to make it easier to handle nil from UserDefaults. It returns a nil value in that case:

        private func decode(_ plist: Any?) -> ValueType? {
            guard let plist = plist else { return nil }

As with encoding, decoding a Date or Data is special-cased to pass the value through:

            switch ValueType.self {
            case is Date.Type,
                 is Data.Type:
                return plist as? ValueType

For all other values, we'll use PropertyListDecoder. As with the encoder, there's no way to have PropertyListDecoder work on property list objects directly. It only works with data, so we have to encode the object as data, then decode that:

            default:
                let data = try? PropertyListSerialization.data(fromPropertyList: plist, format: .binary, options: 0)
                guard let dataUnwrapped = data else { return nil }
                return try? PropertyListDecoder().decode(ValueType.self, from: dataUnwrapped)
            }
        }

Unlike the encoder, the decoder is perfectly happy with a top-level number or date, so the array dance is not necessary on this end.

With encoding and decoding implemented, we've reached the end!

Conclusion
This was a fun project for an NSCoderNight, and it's interesting to see just how far it can go. The result ends up looking pretty nice, although the problem it's solving isn't particularly pressing. This code is mostly intended as an educational experiment, but it could be used for practical purposes too.

That's it for today. Come back again for more terrifying tales of coding bravery. Friday Q&A is driven by reader ideas (some of them accidental), so as always, if you have a topic you'd like to see covered here, please send it in!

Friday Q&A 2017-09-22: Swift 4 Weak References

Mike Ash — Sat, 23 Sep 2017 00:57:00 GMT

Friday Q&A 2017-09-22: Swift 4 Weak References

Soon after Swift was initially open sourced, I wrote an article about how weak references are implemented. Time moves on and things change, and the implementation is different from what it once was. Today I'm going to talk about the current implementation and how it works compared to the old one, a topic suggested by Guillaume Lessard.

Old Implementation
For those of you who have forgotten the old implementation and don't feel like reading through the last article, let's briefly recall how it works.

In the old implementation, Swift objects have two reference counts: a strong count and a weak count. When the strong count reaches zero while the weak count is still non-zero, the object is destroyed but its memory is not deallocated. This leaves a sort of zombie object sitting in memory, which the remaining weak references point to.

When a weak reference is loaded, the runtime checks to see if the object is a zombie. If it is, it zeroes out the weak reference and decrements the weak reference count. Once the weak count reaches zero, the object's memory is deallocated. This means that zombie objects are eventually cleared out once all weak references to them are accessed.

I loved the simplicity of this implementation, but it had some flaws. One flaw was that the zombie objects could stay in memory for a long time. For classes with large instances (because they contain a lot of properties, or use something like ManagedBuffer to allocate extra memory inline), this could be a serious waste.

Another problem, which I discovered after writing the old article, was that the implementation wasn't thread-safe for concurrent reads. Oops! This was patched, but the discussion around it revealed that the implementers wanted a better implementation of weak references anyway, which would be more resilient to such things.

Object Data
There are many pieces of data which make up "an object" in Swift.

First, and most obviously, there are all of the stored properties declared in the source code. These are directly accessible by the programmer.

Second, there is the object's class. This is used for dynamic dispatch and the type(of:) built-in function. This is mostly hidden, although dynamic dispatch and type(of:) imply its existence.

Third, there are the various reference counts. These are completely hidden unless you do naughty things like read the raw memory of your object or convince the compiler to let you call CFGetRetainCount.

Fourth, you have auxiliary information stored by the Objective-C runtime, like the list of Objective-C weak references (the Objective-C implementation of weak references tracks each weak reference individually) and associated objects.

Where do you store all of this stuff?

In Objective-C, the class and stored properties (i.e. instance variables) are stored inline in the object's memory. The class takes up the first pointer-sized chunk, and the instance variables come after. Auxiliary information is stored in external tables. When you manipulate an associated object, the runtime looks it up in a big hash table which is keyed by the object's address. This is somewhat slow and requires locking so that multithreaded access doesn't fail. The reference count is sometimes stored in the object's memory and sometimes stored in an external table, depending on which OS version you're running and which CPU architecture.

In Swift's old implementation, the class, reference counts, and stored properties were all stored inline. Auxiliary information was still stored in a separate table.

Putting aside how these languages actually do it, let's ask the question: how should they do it?

Each location has tradeoffs. Data stored in the object's memory is fast to access but always takes up space. Data stored in an external table is slower to access but takes up zero space for objects which don't need it.

This is at least part of why Objective-C traditionally didn't store the reference count in the object itself. Objective-C reference counting was created when computers were much less capable than they were now, and memory was extremely limited. Most objects in a typical Objective-C program have a single owner, and thus a reference count of 1. Reserving four bytes of the object's memory to store 1 all the time would be wasteful. By using an external table, the common value of 1 could be represented by the absence of an entry, reducing memory usage.

Every object has a class, and it is constantly accessed. Every dynamic method call needs it. This should go directly in the object's memory. There's no savings from storing it externally.

Stored properties are expected to be fast. Whether an object has them is determined at compile time. Objects with no stored properties can allocate zero space for them even when stored in the object's memory, so they should go there.

Every object has reference counts. Not every object has reference counts that aren't 1, but it's still pretty common, and memory is a lot bigger these days. This should probably go in the object's memory.

Most objects don't have any weak references or associated objects. Dedicating space within the object's memory for these would be wasteful. These should be stored externally.

This is the right tradeoff, but it's annoying. For objects that have weak references and associated objects, they're pretty slow. How can we fix this?

Side Tables
Swift's new implementation of weak references brings with it the concept of side tables.

A side table is a separate chunk of memory which stores extra information about an object. It's optional, meaning that an object may have a side table, or it may not. Objects which need the functionality of a side table can incur the extra cost, and objects which don't need it don't pay for it.

Each object has a pointer to its side table, and the side table has a pointer back to the object. The side table can then store other information, like associated object data.

To avoid reserving eight bytes for the side table, Swift makes a nifty optimization. Initially, the first word of an object is the class, and the next word stores the reference counts. When an object needs a side table, that second word is repurposed to be a side table pointer instead. Since the object still needs reference counts, the reference counts are stored in the side table. The two cases are distinguished by setting a bit in this field that indicates whether it holds reference counts or a pointer to the side table.

The side table allows Swift to maintain the basic form of the old weak reference system while fixing its flaws. Instead of pointing to the object, as it used to work, weak references now point directly at the side table.

Because the side table is known to be small, there's no issue of wasting a lot of memory for weak references to large objects, so that problem goes away. This also points to a simple solution for the thread safety problem: don't preemptively zero out weak references. Since the side table is known to be small, weak references to it can be left alone until those references themselves are overwritten or destroyed.

I should note that the current side table implementation only holds reference counts and a pointer to the original object. Additional uses like associated objects are currently hypothetical. Swift has no built-in associated object functionality, and the Objective-C API still uses a global table.

The technique has a lot of potential, and we'll probably see something like associated objects using it before too long. I'm hopeful that this will open the door to stored properties in extensions class types and other nifty features.

Code
Since Swift is open source, all of the code for this stuff is accessible.

Most of the side table stuff can be found in stdlib/public/SwiftShims/RefCount.h.

The high-level weak reference API, along with juicy comments about the system, can be found in swift/stdlib/public/runtime/WeakReference.h.

Some more implementation and comments about how heap-allocated objects work can be found in stdlib/public/runtime/HeapObject.cpp.

I've linked to specific commits of these files, so that people reading from the far future can still see what I'm talking about. If you want to see the latest and greatest, be sure to switch over to the master branch, or whatever is relevant to your interests, after you click the links.

Conclusion
Weak references are an important language feature. Swift's original implementation was wonderfully clever and had some nice properties, but also had some problems. By adding an optional side table, Swift's engineers were able to solve those problems while keeping the nice, clever properties of the original. The side table implementation also opens up a lot of possibilities for great new features in the future.

That's it for today. Come back again for more crazy programming-related ghost stories. Until then, if you have a topic you'd like to see covered here, please send it in!

The Best New Features in Swift 4

Mike Ash — Fri, 15 Sep 2017 14:43:00 GMT

The Best New Features in Swift 4

I'm afraid I once again don't have a Friday Q&A for you today, but I wrote up the best new features in Swift 4 for the Plausible Labs blog, which is almost as good. Check it out over there!

Corporate Training, NYC Workshop, and Book Update

Mike Ash — Fri, 08 Sep 2017 19:07:00 GMT

Corporate Training, NYC Workshop, and Book Update

I'm afraid I ran out of time for Friday Q&A this week. Will shoot for next week instead. Instead, I present a little update about various other things in my world.

Over at Plausible Labs, we're introducing a corporate training program, where companies can bring us (me, mostly) in to teach their programmers the same sort of wild and wonderful stuff I blog about. We're concentrating on Swift, since it's the hot new thing, but we can do any topic we know about if there's demand. If you work at a company that might be interested in having us, or just know of one, please get in touch. It's all the Friday Q&A goodness you love, concentrated, personalized, and made huggable.

On a similar note, we have another Swift workshop coming up in New York City on Friday, September 29th. Come to this one and I'll teach you all about the deeper darker secrets of protocols, generics, introspection, pointers, and more. Check out the workshop event page for more info or to get tickets.

Finally, the next edition of The Complete Friday Q&A is coming very close to completion. I need to get a cover finished for it, and then there's some administrative mumbo jumbo for getting an ISBN and uploading it and such, and I need to write a few more little secret bonus pages, and then it should be ready to go. I will, of course, announce it here when it's available. Stay tuned!

Friday Q&A 2017-08-25: Swift Error Handling Implementation

Mike Ash — Fri, 25 Aug 2017 13:13:00 GMT

Friday Q&A 2017-08-25: Swift Error Handling Implementation

Swift's error handling is a unique feature of the language. It looks a lot like exceptions in other languages, but the syntax is not quite the same, and it doesn't quite work the same either. Today I'm going to take a look at how Swift errors work on the inside.

Semantics
Let's start with a quick refresher on how Swift errors work at the language level.

Any Swift function can be decorated with a throws keyword, which indicates that it can throw an error:

    func getStringMightFail() throws -> String { ...

To actually throw an error from such a function, use the throw keyword with a value that conforms to the Error protocol:

        throw MyError.brainNotFound

When calling a throws function, you must include the try keyword:

    let string = try getStringMightFail()

The try keyword doesn't do anything, but is a required marker to indicate that the function might throw an error. The call must be in a context where throwing an error is allowed, either in a throws function, or in a do block with a catch handler.

To write a catch handler, place the try call in a do block, and add a catch block:

    do {
        let string = try getStringMightFail()
        ...
    } catch {
        print("Got an error: \(error)")
    }

When an error is thrown, execution jumps to the catch block. The value that was thrown is available in error. You can get fancy with type checking and conditions and multiple catch clauses, but these are the basics. For more information about all the details, see the Error Handling section of The Swift Programming Language.

That's what it does. How does it work?

Implementation
To find out how it works, I wrote some dummy code with error handling that I could disassemble:

    struct MyError: Error {
        var x: Int
        var y: Int
        var z: Int
    }

    func Thrower(x: Int, y: Int, z: Int) throws -> Int {
        throw MyError(x: x, y: y, z: z)
    }

    func Catcher(f: (Int, Int, Int) throws -> Int) {
        do {
            let x = try f(1, 2, 3)
            print("Received \(x)")
        } catch {
            print("Caught \(error)")
        }
    }

Of course, now that Swift is open source, I could just go look at the compiler code and see what it does. But that's no fun, and this is easier.

It turns out that Swift 3 and Swift 4 do it differently. I'll briefly discuss Swift 3, then look a bit deeper at Swift 4, since that's up and coming.

Swift 3 works by essentially automating Objective-C's NSError convention. The compiler inserts an extra, hidden parameter which is essentially Error *, or NSError **. Throwing an error consists of writing the error object to the pointer passed in that parameter. The caller allocates some stack space and passes its address in that parameter. On return, it checks to see if that space now contains an error. If it does, it jumps to the catch block.

Swift 4 gets a little fancier. The basic idea is the same, but instead of a normal extra parameter, a special register is reserved for the error return. Here's what the relevant assembly code in Thrower looks like:

    call       imp___stubs__swift_allocError
    mov        qword [rdx], rbx
    mov        qword [rdx+8], r15
    mov        qword [rdx+0x10], r14
    mov        r12, rax

This calls into the Swift runtime to allocate a new error, fills it out with the relevant values, and then places the pointer into r12. It then returns to the caller. The relevant code in Catcher looks like this:

    call       r14
    mov        r15, rax
    test       r12, r12
    je         loc_100002cec

It makes the call, then checks if r12 contains anything. If it does, it jumps to the catch block. The technique on ARM64 is almost the same, with the x21 register serving as the error pointer.

Internally, it looks a lot like returning a Result type, or otherwise returning some sort of error code. The throws function returns the thrown error to the caller in a special place. The caller checks that place for an error, and jumps to the error handling code if so. The generated code looks similar to Objective-C code using an NSError ** parameter, and in fact Swift 3's version of it is identical.

Comparison With Exceptions
Swift is careful never to use the word "exception" when discussing its error handling system, but it looks a lot like exceptions in other languages. How does its implementation compare? There are a lot of languages out there with exceptions, and many of them do things differently, but the natural comparison is C++. Objective-C exceptions (which do exist, although pretty much nobody uses them) use C++'s exceptions mechanism on the modern runtime.

A full exploration of how C++ exceptions work could fill a book, so we'll have to settle for a brief description.

C++ code that calls throwing functions (which is the default for C++ functions) produces assembly exactly as if it called non-throwing functions. Which is to say, it passes in parameters and retrieves return values and gives no thought to the possibility of exceptions.

How can this possibly work? In addition to generating the no-exceptions code, the compiler also generates a table with information about how (and whether) the code handles exceptions and how to safely unwind the stack to exit out of the function in the event that an exception is thrown.

When some function throws an exception, it walks up the stack, looking up each function's information and using that to unwind the stack to the next function, until it either finds an exception handler or runs off the end. If it finds an exception handler, it transfers control to that handler which then runs the code in the catch block.

For more information about how C++ exceptions work, see C++ ABI for Itanium: Exception Handling.

This system is called "zero-cost" exception handling. The term "zero-cost" refers to what happens when no exceptions are ever thrown. Because that code is compiled exactly as it would be without exceptions, there's no runtime overhead for supporting exceptions. Calling potentially-throwing functions is just as fast as calling functions that don't throw, and adding try blocks to your code doesn't result in any additional work done at runtime.

When an exception is thrown, the concept of "zero-cost" goes out the window. Unwinding the stack using the tables is an expensive process and takes a substantial amount of time. The system is designed around the idea that exceptions are thrown rarely, and performance in the case where no exceptions are ever thrown is more important. This assumption is likely to be true in almost all code.

Compared to this, Swift's system is extremely simple. It makes no attempt to generate the same code for throws and non-throws functions. Instead, every call to a throws function is followed by a check to see if an error was returned, and a jump to the appropriate error handling code if so. These checks aren't free, although they should be pretty cheap.

The tradeoff makes a lot of sense for Swift. Swift errors look a lot like C++ exceptions, but in practice they're used differently. Nearly any C++ call can potentially throw, and even basic stuff like the new operator will throw to indicate an error. Explicitly checking for a thrown exception after every call would add a lot of extra checks. In contrast, few Swift calls are marked throws in typical codebases, so the cost of explicit checks is low.

Conclusion
Swift's error handling invites comparison with exceptions in other languages, such as C++. C++'s exception handling is extremely complicated internally, but Swift takes a different approach. Instead of unwind tables to achieve "zero-cost" in the common case, Swift returns thrown errors in a special register, and the caller checks that register to see if an error has been thrown. This adds a bit of overhead when errors aren't thrown, but avoids making things enormously complicated the way C++ does. It would take serious effort to write Swift code where the overhead from error handling makes any noticeable difference.

That's it for today! Come back again for more excitement, fun, and horror. As I have occasionally mentioned before, Friday Q&A is driven by reader suggestions. As always, if you have a topic you'd like to see covered here, send it in!

Friday Q&A 2017-08-11: Swift.Unmanaged

Mike Ash — Fri, 11 Aug 2017 13:14:00 GMT

Friday Q&A 2017-08-11: Swift.Unmanaged

In order to work with C APIs, we sometimes need to convert Swift object references to and from raw pointers. Swift's Unmanaged struct is the standard API for handling this. Today, I'd like to talk about what it does and how to use it.

Overview
Getting Swift references into and out of the world of C encompasses two separate tasks.

The first task is converting a reference into the raw bytes for a void *, or converting the raw bytes of a void * back to a reference. Since Swift references are implemented as pointers, this is straightforward. It's really just a matter of getting the type system to cooperate. You can do this with unsafeBitCast, although I strongly recommend against it. If these details ever changed, you'd be in trouble, whereas Unmanaged will continue to work.

The second task is making Swift's memory management work with a pointer that Swift can't see. Swift's ARC memory management requires the compiler to manage references to each object so that it can insert the appropriate retain and release calls. Once the pointer gets passed into C, it can no longer do this. Instead, you must manually tell the system how to handle memory management at each stage. Unmanaged provides the facilities to do this.

Unmanaged wraps a reference to an object. It is possible to keep Unmanaged values around long-term, but you typically use it as a brief, temporary stop on the way to or from a raw pointer.

From Swift to C
To pass a reference from Swift to C, you must create an Unmanaged value from that reference, then ask it for a raw pointer. There are two ways to create an Unmanaged value from a Swift reference. To figure out which one you need, you must figure out your memory management requirements.

Using Unmanaged.passRetained(obj) will perform an unbalanced retain on the object. This effectively confers ownership on whatever C API you're passing the pointer to. It must be balanced with a release at a later point, or the object will leak.

Using Unmanaged.passUnretained(obj) will leave the object's retain count unchanged. This does not confer any ownership on the C API you pass the pointer to. This is suitable when passing to a C API which takes ownership internally (for example, passing an object into a CFArray which will retain it) and when passing to a C API which uses the value immediately but doesn't hold onto it long-term. If you pass such a value to a C API that holds the pointer long-term and doesn't take ownership, your object may be deallocated prematurely resulting in a crash or worse.

Once you have the Unmanaged value, you can obtain a raw pointer from it using the toOpaque method. Since you already decided how memory management needs to be handled, there's only one call here, and no decisions to make at this point.

Here's an example of how you'd get a raw pointer with ownership:

    let ptr = Unmanaged.passRetained(obj).toOpaque()
    FunctionCall(ptr)

And here's an example without ownership:

    let ptr = Unmanaged.passUnretained(obj).toOpaque()
    FunctionCall(ptr)

In both cases, ptr is an UnsafeMutableRawPointer which is Swift's equivalent to C's void *, and can be passed into C APIs.

From C to Swift
To retrieve a reference from C into Swift code, you create an Unmanaged value from the raw pointer, then take a reference from it. There are two ways to take a reference: one which consumes an unbalanced retain, and once which performs no memory management.

To create the Unmanaged value, use fromOpaque. Unlike passRetained and passUnretained, you must specify the generic type for Unmanaged when using fromOpaque so that the compiler knows which class you're expecting to receive.

Once you have the Unmanaged value, you can get a reference out of it. Using takeRetainedValue will perform an unbalanced release, which will balance an unbalanced retain previously performed with passRetained, or by C code which confers ownership on your code. Using takeUnretainedValue will obtain the object reference without performing any retain or release.

Here's an example of getting a reference from a pointer while performing an unbalanced release:

    let obj = Unmanaged<MyClass>.fromOpaque(ptr).takeRetainedValue()
    obj.method()

And without a retain or release:

    let obj = Unmanaged<MyClass>.fromOpaque(ptr).takeUnretainedValue()
    obj.method()

Patterns
It's possible to use Umanaged in a one-sided way, with C APIs which take a pointer and do stuff with it, or which return a pointer. But the most common way to use Unmanaged is with C APIs which pass around a context pointer. In this situation, you control both sides, and you need to figure out your memory management to avoid crashing or leaking. How you do that will depend on what you're doing. There are a few fundamental patterns to use.

Synchronous Callback
Some APIs call you back immediately with your context pointer, completing all of their work before they return. In this case, you can pass the object unretained, and take it unretained. For example, here's some code that calls CFArrayApplyFunction and calls a method on self for each element in the array:

    let context = Unmanaged.passUnretained(self).toOpaque()
    let range = CFRangeMake(0, CFArrayGetCount(array))
    CFArrayApplyFunction(array, range, { element, context in
        let innerSelf = Unmanaged<MyClass>.fromOpaque(context!).takeUnretainedValue()
        innerSelf.method(element)
    }, context)

Because the function runs immediately, and we know that self will stay alive for the duration, we don't need to do any memory management on it.

Asynchronous One-Shot Callback
Some APIs perform a single callback later on, for example to inform you that some task has completed. In this case, you want to retain going in, and release in the callback. Here's an example using CFHost to perform asynchronous DNS resolution. As a bonus, this code also contains a one-sided use of Unmanaged to retrieve the return value of CFHostCreateWithName.

It starts by creating the host. CFHostCreateWithName returns an Unmanaged, presumably because it hasn't had any curated bridging done to it. Since it returns a value that we own, we use takeRetainedValue() on it to get the underlying value out:

    let host = CFHostCreateWithName(nil, "mikeash.com" as CFString).takeRetainedValue()

We need a context to pass to the host object. We'll ignore all fields of this context except info, which is the raw pointer where we'll put the pointer to self:

    var context = CFHostClientContext()
    context.info = Unmanaged.passRetained(self).toOpaque()

We'll set the callback using CFHostSetClient:

    CFHostSetClient(host, { host, typeInfo, error, info in

Here, we retrieve self passed through info by using Unmanaged again. Since we passed it retained, we use takeRetainedValue here to balance it out. We can then use the resulting reference.

        let innerSelf = Unmanaged<MyClass>.fromOpaque(info!).takeRetainedValue()
        innerSelf.resolved(host)
    }, &context)

This code starts the asynchronous resolution process:

    CFHostScheduleWithRunLoop(host, CFRunLoopGetCurrent(), CFRunLoopMode.commonModes.rawValue)
    CFHostStartInfoResolution(host, .addresses, nil)

Asynchronous Multi-Shot Callback
Finally, some APIs take a callback which they invoke many times later on. To handle this, you need to retain going in to ensure that the object stays alive. You must not release in the callback, since the first callback would destroy the object and leave subsequent callbacks with a pointer to junk. Instead, you must release the value later on, when all of the callbacks are done. Typically the API will provide a separate destruction callback to handle this.

Here's an example using CFRunLoopTimer to perform a task once per second. It starts by creating a context and filling out the info field like above:

    var context = CFRunLoopTimerContext()
    context.info = Unmanaged.passRetained(self).toOpaque()

This example also fills out the context's release field. In the release callback, it uses fromOpaque to get an Unmanaged for the info pointer, and then calls release to balance the retain:

    context.release = { Unmanaged<MyClass>.fromOpaque($0!).release() }

It then creates the timer and passes a callback:

    let timer = CFRunLoopTimerCreate(nil, 0, 1, 0, 0, { timer, info in

It retrieves self using Unmanaged. It uses takeUnretainedValue here, since we don't want to modify the object's retain count:

        let innerSelf = Unmanaged<MyClass>.fromOpaque(info!).takeUnretainedValue()
        innerSelf.doStuff()
    }, &context)

Finally, it adds the timer to the runloop so the timer will fire:

    CFRunLoopAddTimer(CFRunLoopGetCurrent(), timer, CFRunLoopMode.commonModes)

Miscellaneous Functionality
Most of what you need from Unmanaged can be done using fromOpaque, passRetained, passUnretained, takeRetainedValue, takeUnretainedValue, and toOpaque. However, Unmanaged also exposes methods for performing memory management directly on objects without the window dressing of passing or taking values. We saw that briefly in the previous example with the call to release. Unmanaged provides three plain memory management calls:

retain
release
autorelease

These all do exactly what their names indicate. It's rare to need these, aside from release to balance retains as shown above, but they're there if you do.

Conclusion
One of Swift's great strengths is the smoothness with which it interoperates with C code. The Unmanaged API is a key component of that. Unfortunately, it requires the programmer to make memory management decisions as pointers move in and out of Swift, but that's an inherent part of the job it does. Once you have the memory management figured out, it's mostly straightforward to use.

That wraps things up for now. Come back soon for more programming-related mystery goo. Until then, Friday Q&A is driven by reader ideas, so if you have a topic you'd like to see covered, please send it in!

Friday Q&A 2017-07-28: A Binary Coder for Swift

Mike Ash — Fri, 28 Jul 2017 12:44:00 GMT

Friday Q&A 2017-07-28: A Binary Coder for Swift

In my last article I discussed the basics of Swift's new Codable protocol, briefly discussed how to implement your own encoder and decoder, and promised another article about a custom binary coder I've been working on. Today, I'm going to present that binary coder.

Source Code
As usual, the source code is available on GitHub:

https://github.com/mikeash/BinaryCoder/tree/887cecd70c070d86f338065f59ed027c13952c83

Concept and Approach
This coder serializes fields by writing them out sequentially as raw bytes, with no metadata. For example:

    struct S {
        var a: Int16
        var b: Int32
        var c: Int64
    }

The result of encoding an instance of S is fourteen bytes long, with two bytes for a, four bytes for b, and eight bytes for c. The result is almost the same as writing out the raw underlying memory of S, except there's no padding, the numbers are byte-swapped to be endian agnostic, and it's able to intelligently chase down references and do custom encoding when needed.

This type of straightforward binary encoding is a little hobby of mine, and I've previously experimented with other approaches to it in Swift, none of which were satisfactory. When the Swift 4 beta became available with Codable, I looked to see if it would work for this, and it did!

My use of Codable is somewhat abusive. I want to take advantage of the compiler-generated Encodable and Decodable implementations, but those use keyed coding, whereas the straight-line no-metadata binary format is pretty much the polar opposite of keyed coding. The solution is simple: ignore the keys, and rely on the encoding and decoding order to be consistent. This is ugly, and a bad idea in general, but it does work, and even got a tweet from a member of the Swift core team indicating it might be OK. This approach is obviously not resilient to changes in your field layout or field types, but as long as you're aware of this and understand it, that's acceptable.

It does mean that arbitrary implementations of Codable can't be trusted to work with this coder. We know that the compiler-generated implementations work, with limitations, but there may be implementations in the standard library (for example, the implementation for Array) which rely on semantics that this coder doesn't support. In order to ensure that types don't partipate in binary coding without some vetting, I created my own protocols for binary coding:

    public protocol BinaryEncodable: Encodable {
        func binaryEncode(to encoder: BinaryEncoder) throws
    }

    public protocol BinaryDecodable: Decodable {
        init(fromBinary decoder: BinaryDecoder) throws
    }

    public typealias BinaryCodable = BinaryEncodable & BinaryDecodable

I wrote extensions to simplify the common case where you just want to use the compiler's implementation of Codable:

    public extension BinaryEncodable {
        func binaryEncode(to encoder: BinaryEncoder) throws {
            try self.encode(to: encoder)
        }
    }

    public extension BinaryDecodable {
        public init(fromBinary decoder: BinaryDecoder) throws {
            try self.init(from: decoder)
        }
    }

This way, your own types can just conform to BinaryCodable, and they'll get a default implementation of everything they need, as long as they meet the requirements. It's required that all fields must be Codable, but we can't require all fields to be BinaryCodable. That type checking has to be done at runtime, which is unfortunate, but acceptable.

The encoder and decoder implementation are straightforward: they encode/decode everything in order, ignoring the keys. The encoder produces bytes corresponding to the values that are encoded, and the decoder produces values from the bytes it has stored.

BinaryEncoder Basics
The encoder is a public class:

    public class BinaryEncoder {

It has one field, which is the data it has encoded so far:

    fileprivate var data: [UInt8] = []

This data starts out empty, and bytes are appended to it as values are encoded.

A convenience method wraps up the process of creating an encoder instance, encoding an object into it, and returning the instance's data:

    static func encode(_ value: BinaryEncodable) throws -> [UInt8] {
        let encoder = BinaryEncoder()
        try value.binaryEncode(to: encoder)
        return encoder.data
    }

The encoding process can throw runtime errors, so the encoder needs an error type:

    enum Error: Swift.Error {
        case typeNotConformingToBinaryEncodable(Encodable.Type)
        case typeNotConformingToEncodable(Any.Type)
    }

Let's move on to the low-level encoding methods. We'll start with a generic method which will encode the raw bytes of a value:

    func appendBytes<T>(of: T) {
        var target = of
        withUnsafeBytes(of: &target) {
            data.append(contentsOf: $0)
        }
    }

This will form the basis for other encoding methods.

Let's take a quick look at the methods for encoding Float and Double next. CoreFoundation has helper functions which take care of any byte swapping that's needed for them, so these methods call those functions and then call appendBytes with the result:

    func encode(_ value: Float) {
        appendBytes(of: CFConvertFloatHostToSwapped(value))
    }

    func encode(_ value: Double) {
        appendBytes(of: CFConvertDoubleHostToSwapped(value))
    }

While we're at it, here's the method for encoding a Bool. It translates the Bool to a UInt8 containing a 0 or 1 and encodes that:

    func encode(_ value: Bool) throws {
        try encode(value ? 1 as UInt8 : 0 as UInt8)
    }

BinaryEncoder has one more encode method, which takes care of encoding all other Encodable types:

    func encode(_ encodable: Encodable) throws {

This has special cases for various types, so it switches on the parameter:

        switch encodable {

Int and UInt need special handling, because their sizes aren't consistent. Depending on the target platform, they may be 32 bits or 64 bits. To solve this, we convert them to Int64 or UInt64 and then encode that value:

        case let v as Int:
            try encode(Int64(v))
        case let v as UInt:
            try encode(UInt64(v))

All other integer types are handled with the FixedWidthInteger protocol, which exposes enough functionality to do the necessary byte swapping for encoding values. Because FixedWidthInteger uses Self for some return types, I wasn't able to do the work directly here. Instead, I extended FixedWidthInteger with a binaryEncode method that handles the work:

        case let v as FixedWidthInteger:
            v.binaryEncode(to: self)

Float, Double, and Bool call the type-specific methods above:

        case let v as Float:
            encode(v)
        case let v as Double:
            encode(v)
        case let v as Bool:
            try encode(v)

Anything that's BinaryEncodable is encoded by calling its binaryEncode method and passing self:

        case let binary as BinaryEncodable:
            try binary.binaryEncode(to: self)

There's one more case to handle. Any value that gets this far is not a type that we know how to encode natively, nor is it BinaryEncodable. In this case, we throw an error to inform the caller that this value doesn't conform to the protocol:

        default:
            throw Error.typeNotConformingToBinaryEncodable(type(of: encodable))
        }
    }

Finally, let's look at the FixedWidthInteger extension. All this has to do is call self.bigEndian to get a portable representation of the integer type, and then call appendBytes on the encoder to encode that representation:

    private extension FixedWidthInteger {
        func binaryEncode(to encoder: BinaryEncoder) {
            encoder.appendBytes(of: self.bigEndian)
        }
    }

We now have all the important parts of binary encoding, but we still don't have an Encoder implementation. To accomplish that, we'll create implementations of the container protocols which call back to the BinaryEncoder to do the work.

BinaryEncoder Encoder Implementation
Let's start by looking at the implementations of the containers. We'll start with the KeyedEncodingContainerProtocol implementation:

    private struct KeyedContainer<Key: CodingKey>: KeyedEncodingContainerProtocol {

The implementation needs a reference to the binary encoder that it's working in:

        var encoder: BinaryEncoder

Encoder requires a codingPath property which returns an array of CodingKey values indicating the current path into the encoder. Since this encoder doesn't really support keys in the first place, we always return an empty array:

        public var codingPath: [CodingKey] { return [] }

Code which uses this class will have to be implemented not to require this value to make any sense.

The protocol then has a ton of methods for encoding all of the various types that it supports:

    public mutating func encode(_ value: Bool, forKey key: Self.Key) throws
    public mutating func encode(_ value: Int, forKey key: Self.Key) throws
    public mutating func encode(_ value: Int8, forKey key: Self.Key) throws
    public mutating func encode(_ value: Int16, forKey key: Self.Key) throws
    public mutating func encode(_ value: Int32, forKey key: Self.Key) throws
    public mutating func encode(_ value: Int64, forKey key: Self.Key) throws
    public mutating func encode(_ value: UInt, forKey key: Self.Key) throws
    public mutating func encode(_ value: UInt8, forKey key: Self.Key) throws
    public mutating func encode(_ value: UInt16, forKey key: Self.Key) throws
    public mutating func encode(_ value: UInt32, forKey key: Self.Key) throws
    public mutating func encode(_ value: UInt64, forKey key: Self.Key) throws
    public mutating func encode(_ value: Float, forKey key: Self.Key) throws
    public mutating func encode(_ value: Double, forKey key: Self.Key) throws
    public mutating func encode(_ value: String, forKey key: Self.Key) throws
    public mutating func encode<T>(_ value: T, forKey key: Self.Key) throws where T : Encodable

We'll have to implement all of those one by one. Let's start with the last one, which handles generic Encodable values. It just needs to call through to BinaryEncoder's encode method:

        func encode<T>(_ value: T, forKey key: Key) throws where T : Encodable {
            try encoder.encode(value)
        }

We can use a similar technique to implement the other methods, and... what's this? All of the compiler errors about protocol conformance have gone away?

It turns out that this one implementation of encode satisfies all of the encode methods in the protocol, because all of the other types are Encodable. A suitable generic method will fulfill any matching protocol requirements. It's obvious in retrospect, but I didn't realize it until I was halfway done with this code and saw that errors didn't appear when I deleted type-specific methods.

Now we can see why I implemented BinaryEncoder's encode method with a big switch statement instead of using separate implementations for all of the various supported types. Overloaded methods are resolved at compile time based on the static type that's available at the call site. The above call to encoder.encode(value) will always call func encode(_ encodable: Encodable) even if the actual value passed in is, say, a Double or a Bool. In order to allow for this simple wrapper, the implementation in BinaryEncoder has to work with a single entry point, which means it needs to be a big switch statement.

KeyedEncodingContainerProtocol requires a few other methods. There's one for encoding nil, which we implement to do nothing:

        func encodeNil(forKey key: Key) throws {}

Then there are four methods for returning nested containers or superclass encoders. We don't do anything clever here, so this just delegates back to the encoder:

        func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type, forKey key: Key) -> KeyedEncodingContainer<NestedKey> where NestedKey : CodingKey {
            return encoder.container(keyedBy: keyType)
        }

        func nestedUnkeyedContainer(forKey key: Key) -> UnkeyedEncodingContainer {
            return encoder.unkeyedContainer()
        }

        func superEncoder() -> Encoder {
            return encoder
        }

        func superEncoder(forKey key: Key) -> Encoder {
            return encoder
        }
    }

We also need implementations of UnkeyedEncodingContainer and SingleValueEncodingContainer. It turns out that those protocols are similar enough that we can use a single implementation for both. The actual implementation is almost the same as it was for KeyedEncodingContainerProtocol, with the addition of a dummy count property:

    private struct UnkeyedContanier: UnkeyedEncodingContainer, SingleValueEncodingContainer {
        var encoder: BinaryEncoder

        var codingPath: [CodingKey] { return [] }

        var count: Int { return 0 }

        func nestedContainer<NestedKey>(keyedBy keyType: NestedKey.Type) -> KeyedEncodingContainer<NestedKey> where NestedKey : CodingKey {
            return encoder.container(keyedBy: keyType)
        }

        func nestedUnkeyedContainer() -> UnkeyedEncodingContainer {
            return self
        }

        func superEncoder() -> Encoder {
            return encoder
        }

        func encodeNil() throws {}

        func encode<T>(_ value: T) throws where T : Encodable {
            try encoder.encode(value)
        }
    }

Using these containers, we'll make BinaryEncoder conform to Encoder.

Encoder requires a codingPath property like the containers do:

    public var codingPath: [CodingKey] { return [] }

It also requires a userInfo property. We don't support that either, so it returns an empty dictionary:

    public var userInfo: [CodingUserInfoKey : Any] { return [:] }

Then there are three methods which return containers:

    public func container<Key>(keyedBy type: Key.Type) -> KeyedEncodingContainer<Key> where Key : CodingKey {
        return KeyedEncodingContainer(KeyedContainer<Key>(encoder: self))
    }

    public func unkeyedContainer() -> UnkeyedEncodingContainer {
        return UnkeyedContanier(encoder: self)
    }

    public func singleValueContainer() -> SingleValueEncodingContainer {
        return UnkeyedContanier(encoder: self)
    }

That's the end of BinaryEncoder.

BinaryDecoder Basics
The decoder is a public class too:

    public class BinaryDecoder {

Like the encoder, it has some data:

    fileprivate let data: [UInt8]

Unlike the encoder, the decoder's data is loaded into the object when it's created. The caller provides the data that the decoder will decode from:

    public init(data: [UInt8]) {
        self.data = data
    }

The decoder also needs to keep track of where it is inside the data it's decoding. It does that with a cursor property, which starts out at the beginning of the data:

    fileprivate var cursor = 0

A convenience method wraps up the process of creating a decoder and decoding a value:

    static func decode<T: BinaryDecodable>(_ type: T.Type, data: [UInt8]) throws -> T {
        return try BinaryDecoder(data: data).decode(T.self)
    }

The decoder has its own errors it can throw during the decoding process. Decoding can fail in many more ways than encoding, so BinaryDecoder's Error type has a lot more cases:

    enum Error: Swift.Error {
        case prematureEndOfData
        case typeNotConformingToBinaryDecodable(Decodable.Type)
        case typeNotConformingToDecodable(Any.Type)
        case intOutOfRange(Int64)
        case uintOutOfRange(UInt64)
        case boolOutOfRange(UInt8)
        case invalidUTF8([UInt8])
    }

Now we can get on to actual decoding. The lowest level method reads a certain number of bytes out of data into a pointer, advancing cursor, or throwing prematureEndOfData if data doesn't have enough bytes in it:

    func read(_ byteCount: Int, into: UnsafeMutableRawPointer) throws {
        if cursor + byteCount > data.count {
            throw Error.prematureEndOfData
        }

        data.withUnsafeBytes({
            let from = $0.baseAddress! + cursor
            memcpy(into, from, byteCount)
        })

        cursor += byteCount
    }

There's also a small generic wrapper which takes an inout T and reads into that value, using MemoryLayout to figure out how many bytes to read.

    func read<T>(into: inout T) throws {
        try read(MemoryLayout<T>.size, into: &into)
    }

Like BinaryEncoder, BinaryDecoder has methods for decoding floating-point types. For these, it creates an empty CFSwappedFloat value, reads into it, and then calls the appropriate CF function to convert it to the floating-point type in question:

    func decode(_ type: Float.Type) throws -> Float {
        var swapped = CFSwappedFloat32()
        try read(into: &swapped)
        return CFConvertFloatSwappedToHost(swapped)
    }

    func decode(_ type: Double.Type) throws -> Double {
        var swapped = CFSwappedFloat64()
        try read(into: &swapped)
        return CFConvertDoubleSwappedToHost(swapped)
    }

The method for decoding Bool decodes a UInt8 and then returns false if it's 0, true if it's 1, and otherwise throws an error:

    func decode(_ type: Bool.Type) throws -> Bool {
        switch try decode(UInt8.self) {
        case 0: return false
        case 1: return true
        case let x: throw Error.boolOutOfRange(x)
        }
    }

The general decode method for Decodable uses a big switch statement to decode various specific types:

    func decode<T: Decodable>(_ type: T.Type) throws -> T {
        switch type {

For Int and UInt, it decodes an Int64 or UInt64, then converts to an Int or UInt, or throws an error:

        case is Int.Type:
            let v = try decode(Int64.self)
            if let v = Int(exactly: v) {
                return v as! T
            } else {
                throw Error.intOutOfRange(v)
            }
        case is UInt.Type:
            let v = try decode(UInt64.self)
            if let v = UInt(exactly: v) {
                return v as! T
            } else {
                throw Error.uintOutOfRange(v)
            }

The compiler doesn't realize that T's type must match the values being produced, so the as! T convinces it to compile this code.

Other integers are handled through FixedWidthInteger using an extension method:

        case let intT as FixedWidthInteger.Type:
            return try intT.from(binaryDecoder: self) as! T

Float, Double, and Bool all call their type-specific decoding methods:

        case is Float.Type:
            return try decode(Float.self) as! T
        case is Double.Type:
            return try decode(Double.self) as! T
        case is Bool.Type:
            return try decode(Bool.self) as! T

BinaryDecodable types use the initializer defined in that protocol, passing self:

        case let binaryT as BinaryDecodable.Type:
            return try binaryT.init(fromBinary: self) as! T

If none of the cases are hit, then throw an error:

        default:
            throw Error.typeNotConformingToBinaryDecodable(type)
        }
    }

The FixedWidthInteger method uses Self.init() to make a value, reads bytes into it, and then uses the bigEndian: initializer to perform byte swapping:

    private extension FixedWidthInteger {
        static func from(binaryDecoder: BinaryDecoder) throws -> Self {
            var v = Self.init()
            try binaryDecoder.read(into: &v)
            return self.init(bigEndian: v)
        }
    }

That takes care of the foundation. Now to implement Decoder.

BinaryDecoder Decoder Implementation
As before, we implement the three container protocols. We'll start with the keyed container:

    private struct KeyedContainer<Key: CodingKey>: KeyedDecodingContainerProtocol {

It delegates everything to the decoder, so it needs a reference to that:

        var decoder: BinaryDecoder

The protocol requires codingPath:

        var codingPath: [CodingKey] { return [] }

It also requires allKeys, which returns all keys that the container knows about. Since we don't really support keys in the first place, this returns an empty array:

        var allKeys: [Key] { return [] }

There's also a method to see if the container contains a given key. We'll just blindly say "yes" to all such questions:

        func contains(_ key: Key) -> Bool {
            return true
        }

As before, KeyedDecodingContainerProtocol has a ton of different decode methods which can all be satisfied with a single generic method for Decodable:

        func decode<T>(_ type: T.Type, forKey key: Key) throws -> T where T : Decodable {
            return try decoder.decode(T.self)
        }

There's also a decodeNil, which we'll have do nothing and always succeed:

        func decodeNil(forKey key: Key) throws -> Bool {
            return true
        }

Nested containers and superclass decodes delegate back to the decoder:

        func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type, forKey key: Key) throws -> KeyedDecodingContainer<NestedKey> where NestedKey : CodingKey {
            return try decoder.container(keyedBy: type)
        }

        func nestedUnkeyedContainer(forKey key: Key) throws -> UnkeyedDecodingContainer {
            return try decoder.unkeyedContainer()
        }

        func superDecoder() throws -> Decoder {
            return decoder
        }

        func superDecoder(forKey key: Key) throws -> Decoder {
            return decoder
        }
    }

Like before, one type can implement both of the other container protocols:

    private struct UnkeyedContainer: UnkeyedDecodingContainer, SingleValueDecodingContainer {
        var decoder: BinaryDecoder

        var codingPath: [CodingKey] { return [] }

        var count: Int? { return nil }

        var currentIndex: Int { return 0 }

        var isAtEnd: Bool { return false }

        func decode<T>(_ type: T.Type) throws -> T where T : Decodable {
            return try decoder.decode(type)
        }

        func decodeNil() -> Bool {
            return true
        }

        func nestedContainer<NestedKey>(keyedBy type: NestedKey.Type) throws -> KeyedDecodingContainer<NestedKey> where NestedKey : CodingKey {
            return try decoder.container(keyedBy: type)
        }

        func nestedUnkeyedContainer() throws -> UnkeyedDecodingContainer {
            return self
        }

        func superDecoder() throws -> Decoder {
            return decoder
        }
    }

Now BinaryDecoder itself can provide dummy implementations of the properties required by Decoder and implement methods to return instances of the containers:

    public var codingPath: [CodingKey] { return [] }

    public var userInfo: [CodingUserInfoKey : Any] { return [:] }

    public func container<Key>(keyedBy type: Key.Type) throws -> KeyedDecodingContainer<Key> where Key : CodingKey {
        return KeyedDecodingContainer(KeyedContainer<Key>(decoder: self))
    }

    public func unkeyedContainer() throws -> UnkeyedDecodingContainer {
        return UnkeyedContainer(decoder: self)
    }

    public func singleValueContainer() throws -> SingleValueDecodingContainer {
        return UnkeyedContainer(decoder: self)
    }

That is the end of BinaryDecoder.

Array and String Extensions
In order to make the coders more useful, I implemented BinaryCodable for Array and String. In theory I could call through to their Codable implementation, but I can't count on that implementation to work with the limitations of the binary coders, and I wouldn't have control over the serialized representation. Instead, I manually implemented it.

The plan is to have Array encode its count, and then encode its elements. To decode, it can decode the count, then decode that many elements. String will convert itself to UTF-8 in the form of Array<UInt8> and then use Array's implementation to do the real work.

Someday, when Swift gets conditional conformances, we'll be able to write extension Array: BinaryCodable where Element: BinaryCodable to indicate that Array is is only codable when its contents are. For now, Swift can't express that notion. Instead, we have to say that Array is always BinaryCodable, and then do runtime type checks to ensure the content is suitable.

Encoding is a matter of checking the type of Element, encoding self.count, then encoding all of the elements:

    extension Array: BinaryCodable {
        public func binaryEncode(to encoder: BinaryEncoder) throws {
            guard Element.self is Encodable.Type else {
                throw BinaryEncoder.Error.typeNotConformingToEncodable(Element.self)
            }

            try encoder.encode(self.count)
            for element in self {
                try (element as! Encodable).encode(to: encoder)
            }
        }

Decoding is the opposite. Check the type, decode the count, then decode that many elements:

        public init(fromBinary decoder: BinaryDecoder) throws {
            guard let binaryElement = Element.self as? Decodable.Type else {
                throw BinaryDecoder.Error.typeNotConformingToDecodable(Element.self)
            }

            let count = try decoder.decode(Int.self)
            self.init()
            self.reserveCapacity(count)
            for _ in 0 ..< count {
                let decoded = try binaryElement.init(from: decoder)
                self.append(decoded as! Element)
            }
        }
    }

String can then encode itself by creating an Array from its utf8 property and encoding that:

    extension String: BinaryCodable {
        public func binaryEncode(to encoder: BinaryEncoder) throws {
            try Array(self.utf8).binaryEncode(to: encoder)
        }

Decoding decodes the UTF-8 Array and then creates a String from it. This will fail if the decoded Array isn't valid UTF-8, so there's a little extra code here to check for that and throw an error:

        public init(fromBinary decoder: BinaryDecoder) throws {
            let utf8: [UInt8] = try Array(fromBinary: decoder)
            if let str = String(bytes: utf8, encoding: .utf8) {
                self = str
            } else {
                throw BinaryDecoder.Error.invalidUTF8(utf8)
            }
        }
    }

Example Use
That takes care of binary encoding and decoding. Use is simple. Declare conformance to BinaryCodable, then use BinaryEncoder and BinaryDecoder on your types:

    struct Company: BinaryCodable {
        var name: String
        var employees: [Employee]
    }

    struct Employee: BinaryCodable {
        var name: String
        var jobTitle: String
        var age: Int
    }

    let company = Company(name: "Joe's Discount Airbags", employees: [
        Employee(name: "Joe Johnson", jobTitle: "CEO", age: 27),
        Employee(name: "Stan Lee", jobTitle: "Janitor", age: 87),
        Employee(name: "Dracula", jobTitle: "Dracula", age: 41),
        Employee(name: "Steve Jobs", jobTitle: "Visionary", age: 56),
    ])
    let data = try BinaryEncoder.encode(company)
    let roundtrippedCompany = try BinaryDecoder.decode(Company.self, data: data)
    // roundtrippedCompany contains the same data as company

Conclusion
Swift's new Codable protocols are a welcome addition to the language to eliminate a lot of boilerplate code. It's flexible enough to make it straightforward to use/abuse it for things well beyond JSON and property list parsing. Unsophisticated binary formats such as this are not often called for, but they have their uses, and it's interesting to see how Codable can be used for something so different from the built-in facilities. The Encoder and Decoder protocols are large, but judicious use of generics can cut down a lot of the repetitive code, and implementation is relatively simple in the end.

BinaryCoder was written for exploratory and educational purposes, and it's probably not what you want to use in your own programs. However, there are cases where it could be suitable, as long as you understand the tradeoffs involved.

That's it for today! Come back again for more exciting byte-related adventures. As always, Friday Q&A is driven by reader ideas, so if you have a topic you'd like to see covered, please send it in!

Friday Q&A 2017-07-14: Swift.Codable

Mike Ash — Fri, 14 Jul 2017 13:57:00 GMT

Friday Q&A 2017-07-14: Swift.Codable

One of the interesting additions to Swift 4 is the Codable protocol and the machinery around it. This is a subject near and dear to my heart, and I want to discuss what it is and how it works today.

Serialization
Serializing values to data that can be stored on disk or transmitted over a network is a common need. It's especially common in this age of always-connected mobile apps.

So far, the options for serialization in Apple's ecosystem were limited:

NSCoding provides intelligent serialization of complex object graphs and works with your own types, but works with a poorly documented serialization format not suitable for cross-platform work, and requires writing code to manually encode and decode your types.
NSPropertyListSerialization and NSJSONSerialization can convert between standard Cocoa types like NSDictionary/NSString and property lists or JSON. JSON in particular is used all over the place for server communication. Since these APIs provide low-level values, you have to write a bunch of code to extract meaning from those values. That code is often ad-hoc and handles bad data poorly.
NSXMLParser and NSXMLDocument are the choice of masochists or people stuck working with systems that use XML. Converting between the basic parsed data and more meaningful model objects is once again up to the programmer.
Finally, there's always the option to build your own from scratch. This is fun, but a lot of work, and error-prone.

These approaches tend to result in a lot of boilerplate code, where you declare a property called foo of type String which is encoded by storing the String stored in foo under the key "foo" and is decoded by retrieving the value for the key "foo", attempting to cast it to a String, storing it into foo on success, or throwing an error on failure. Then you declare a property called bar of type String which....

Naturally, programmers dislike these repetitive tasks. Repitition is what computers are for. We want to be able to just write this:

    struct Whatever {
        var foo: String
        var bar: String
    }

And have it be serializable. It ought to be possible: all the necessary information is already present.

Reflection is a common way to accomplish this. A lot of Objective-C programmers have written code to automatically read and write Objective-C objects to and from JSON objects. The Objective-C runtime provides all of the information you need to do this automatically. For Swift, we can use the Objective-C runtime, or make do with Swift's Mirror and use wacky workarounds to compensate for its inability to mutate properties.

Outside of Apple's ecosystem, this is a common approach in many languages. This has led to various hilarious security bugs over the years.

Reflection is not a particularly good solution to this problem. It's easy to get it wrong and create security bugs. It's less able to use static typing, so more errors happen at runtime rather than compile time. And it tends to be pretty slow, since the code has to be completely general and does lots of string lookups with type metadata.

Swift has taken the approach of compile-time code generation rather than runtime reflection. This means that some of the knowledge has to be built in to the compiler, but the result is fast and takes advantage of static typing, while still remaining easy to use.

Overview
There are a few fundamental protocols that Swift's new encoding system is built around.

The Encodable protocol is used for types which can be encoded. If you conform to this protocol and all stored properties in your type are themselves Encodable, then the compiler will generate an implementation for you. If you don't meet the requirements, or you need special handling, you can implement it yourself.

The Decodable protocol is the companion to the Encodable protocol and denotes types which can be decoded. Like Encodable, the compiler will generate an implementation for you if your stored properties are all Decodable.

Because Encodable and Decodable usually go together, there's another protocol called Codable which is just the two protocols glued together:

    typealias Codable = Decodable & Encodable

These two protocols are really simple. Each one contains just one requirement:

    protocol Encodable {
        func encode(to encoder: Encoder) throws
    }

    protocol Decodable {
        init(from decoder: Decoder) throws
    }

The Encoder and Decoder protocols specify how objects can actually encode and decode themselves. You don't have to worry about these for basic use, since the default implementation of Codable handles all the details for you, but you need to use them if you write your own Codable implementation. These are complex and we'll look at them later.

Finally, there's a CodingKey protocol which is used to denote keys used for encoding and decoding. This adds an extra layer of static type checking to the process compared to using plain strings everywhere. It provides a String, and optionally an Int for positional keys:

    protocol CodingKey {
        var stringValue: String { get }
        init?(stringValue: String)

        var intValue: Int? { get }
        public init?(intValue: Int)
    }

Encoders and Decoders
The basic concept of Encoder and Decoder is similar to NSCoder. Objects receive a coder and then call its methods to encode or decode themselves.

The API of NSCoder is straightforward. NSCoder has a bunch of methods like encodeObject:forKey: and encodeInteger:forKey: which objects call to perform their coding. Objects can also use unkeyed methods like encodeObject: and encodeInteger: to do things positionally instead of by key.

Swift's API is more indirect. Encoder doesn't have any methods of its own for encoding values. Instead, it provides containers, and those containers then have methods for encoding values. There's one container for keyed encoding, one for unkeyed encoding, and one for encoding a single value.

This helps make things more explicit and fits better with portable serialization formats. NSCoder only has to work with Apple's encoding format so it just needs to put the same thing out that it got in. Encoder has to work with things like JSON. If an object encodes values with keys, that should produce a JSON dictionary. If it uses unkeyed encoding then that should produce a JSON array. What if the object is empty and encodes no values? With the NSCoder approach, it would have no idea what to output. With Encoder, the object will still request a keyed or unkeyed container and the encoder can figure it out from that.

Decoder works the same way. You don't decode values from it directly, but rather ask for a container, and then decode values from the container. Like Encoder, Decoder provides keyed, unkeyed, and single value containers.

Because of this container design, the Encoder and Decoder protocols themselves are small. They contain a bit of bookkeeping info, and methods for obtaining containers:

    protocol Encoder {
        var codingPath: [CodingKey?] { get }
        public var userInfo: [CodingUserInfoKey : Any] { get }

        func container<Key>(keyedBy type: Key.Type)
            -> KeyedEncodingContainer<Key> where Key : CodingKey
        func unkeyedContainer() -> UnkeyedEncodingContainer
        func singleValueContainer() -> SingleValueEncodingContainer
    }

    protocol Decoder {
        var codingPath: [CodingKey?] { get }
        var userInfo: [CodingUserInfoKey : Any] { get }

        func container<Key>(keyedBy type: Key.Type) throws
            -> KeyedDecodingContainer<Key> where Key : CodingKey
        func unkeyedContainer() throws -> UnkeyedDecodingContainer
        func singleValueContainer() throws -> SingleValueDecodingContainer
    }

The complexity is in the container types. You can get pretty far by recursively walking through properties of Codable types, but at some point you need to get down to some raw encodable types which can be directly encoded and decoded. For Codable, those types include the various integer types, Float, Double, Bool, and String. That makes for a whole bunch of really similar encode/decode methods. Unkeyed containers also directly support encoding sequences of the raw encodable types.

Beyond those basic methods, there are a bunch of methods that support exotic use cases. KeyedDecodingContainer has methods called decodeIfPresent which return an optional and return nil for missing keys instead of throwing. The encoding containers have methods for weak encoding, which encodes an object only if something else encodes it too (useful for parent references in a complex graph). There are methods for getting nested containers, which allows you to encode hierarchies. Finally, there are methods for getting a "super" encoder or decoder, which is intended to allow subclasses and superclasses to coexist peacefully when encoding and decoding. The subclass can encode itself directly, and then ask the superclass to encode itself with a "super" encoder, which ensures keys don't conflict.

Implementing Codable
Implementing Codable is easy: declare conformance and let the compiler generate it for you.

It's useful to know just what it's doing, though. Let's take a look at what it ends up generating and how you would do it yourself. We'll start with an example Codable type:

    struct Person: Codable {
        var name: String
        var age: Int
        var quest: String
    }

The compiler generates a CodingKeys type nested inside Person. If we did it ourselves, that nested type would look like this:

    private enum CodingKeys: CodingKey {
        case name
        case age
        case quest
    }

The case names match Person's property names. Compiler magic gives each CodingKeys case a string value which matches its case name, which means that the property names are also the keys used for encoding them.

If we need different names, we can easily accomplish this by providing our own CodingKeys with custom raw values. For example, we might write this:

    private enum CodingKeys: String, CodingKey {
        case name = "person_name"
        case age
        case quest
    }

This will cause the name property to be encoded and decoded under person_name. And this is all we have to do. The compiler happily accepts our custom CodingKeys type while still providing a default implementation for the rest of Codable, and that default implementation uses our custom type. You can mix and match customizations with the compiler-provided code.

The compiler also generates an implementation for encode(to:) and init(from:). The implementation of encode(to:) gets a keyed container and then encodes each property in turn:

    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)

        try container.encode(name, forKey: .name)
        try container.encode(age, forKey: .age)
        try container.encode(quest, forKey: .quest)
    }

The compiler generates an implementation of init(from:) which mirrors this:

    init(from decoder: Decoder) throws {
        let container = try decoder.container(keyedBy: CodingKeys.self)

        name = try container.decode(String.self, forKey: .name)
        age = try container.decode(Int.self, forKey: .age)
        quest = try container.decode(String.self, forKey: .quest)
    }

That's all there is to it. Just like with CodingKeys, if you need custom behavior here you can implement your own version of one of these methods while letting the compiler generate the rest. Unfortunately, there's no way to specify custom behavior for an individual property, so you have to write out the whole thing even if you want the default behavior for the rest. This is not particularly terrible, though.

If you were to do it all by hand, the full implementation of Codable for Person would look like this:

    extension Person {
        private enum CodingKeys: CodingKey {
            case name
            case age
            case quest
        }

        func encode(to encoder: Encoder) throws {
            var container = encoder.container(keyedBy: CodingKeys.self)

            try container.encode(name, forKey: .name)
            try container.encode(age, forKey: .age)
            try container.encode(quest, forKey: .quest)
        }

        init(from decoder: Decoder) throws {
            let container = try decoder.container(keyedBy: CodingKeys.self)

            name = try container.decode(String.self, forKey: .name)
            age = try container.decode(Int.self, forKey: .age)
            quest = try container.decode(String.self, forKey: .quest)
        }
    }

Implementing Encoder and Decoder
You may never need to implement your own Encoder or Decoder. Swift provides implementations for JSON and property lists, which take care of the common use cases.

You can implement your own in order to support a custom format. The size of the container protocols means this will take some effort. Fortunately, it's mostly a matter of size, not complexity.

To implement a custom Encoder, you'll need something that implements the Encoder protocol plus implementations of the container protocols. Implementing the three container protocols involves a lot of repetitive code to implement encoding or decoding methods for all of the various directly encodable types.

How they work is up to you. The Encoder will probably need to store the data being encoded, and the containers will inform the Encoder of the various things they're encoding.

Implementing a custom Decoder is similar. You'll need to implement that protocol plus the container protocols. The decoder will hold the serialized data and the containers will communicate with it to provide the requested values.

I've been experimenting with a custom binary encoder and decoder as a way to learn the protocols, and I hope to present that in a future article as an example of how to do it.

Conclusion
Swift 4's Codable API looks great and ought to simplify a lot of common code. For typical JSON tasks, it's sufficient to declare conformance to Codable in your model types and let the compiler do the rest. When needed, you can implement parts of the protocol yourself in order to handle things differently, and you can implement it all if needed.

The companion Encoder and Decoder protocols are more complex, but justifiably so. Supporting a custom format by implementing your own Encoder and Decoder takes some work, but is mostly a matter of filling in a lot of similar blanks.

That's it for today! Come back again for more exciting serialization-related material, and perhaps even things not related to serialization. Until then, Friday Q&A is driven by reader ideas, so if you have a topic you'd like to see covered here, please send it in!

Friday Q&A 2017-06-30: Dissecting objc_msgSend on ARM64

Mike Ash — Sat, 01 Jul 2017 04:23:00 GMT

Friday Q&A 2017-06-30: Dissecting objc_msgSend on ARM64

We're back! During the week of WWDC, I spoke at CocoaConf Next Door, and one of my talks involved a dissection of objc_msgSend's ARM64 implementation. I thought that turning it into an article would make for a nice return to blogging for Friday Q&A.

Overview
Every Objective-C object has a class, and every Objective-C class has a list of methods. Each method has a selector, a function pointer to the implementation, and some metadata. The job of objc_msgSend is to take the object and selector that's passed in, look up the corresponding method's function pointer, and then jump to that function pointer.

Looking up a method can be extremely complicated. If a method isn't found on a class, then it needs to continue searching in the superclasses. If no method is found at all, then it needs to call into the runtime's message forwarding code. If this is the very first message being sent to a particular class, then it has to call that class's +initialize method.

Looking up a method also needs to be extremely fast in the common case, since it's done for every method call. This, of course, is in conflict with the complicated lookup process.

Objective-C's solution to this conflict is the method cache. Each class has a cache which stores methods as pairs of selectors and function pointers, known in Objective-C as IMPs. They're organized as a hash table so lookups are fast. When looking up a method, the runtime first consults the cache. If the method isn't in the cache, it follows the slow, complicated procedure, and then places the result into the cache so that the next time can be fast.

objc_msgSend is written in assembly. There are two reasons for this: one is that it's not possible to write a function which preserves unknown arguments and jumps to an arbitrary function pointer in C. The language just doesn't have the necessary features to express such a thing. The other reason is that it's extremely important for objc_msgSend to be fast, so every last instruction of it is written by hand so it can go as fast as possible.

Naturally, you don't want to write the whole complicated message lookup procedure in assembly langauge. It's not necessary, either, because things are going to be slow no matter what the moment you start going through it. The message send code can be divided into two parts: there's the fast path in objc_msgSend itself, which is written in assembly, and the slow path implemented in C. The assembly part looks up the method in the cache and jump to it if it's found. If the method is not in the cache, then it calls into the C code to handle things.

Therefore, when looking at objc_msgSend itself, it does the following:

Get the class of the object passed in.
Get the method cache of that class.
Use the selector passed in to look up the method in the cache.
If it's not in the cache, call into the C code.
Jump to the IMP for the method.

How does it do all of that? Let's see!

Instruction by Instruction
objc_msgSend has a few different paths it can take depending on circumstances. It has special code for handling things like messages to nil, tagged pointers, and hash table collisions. I'll start by looking at the most common, straight-line case where a message is sent to a non-nil, non-tagged pointer and the method is found in the cache without any need to scan. I'll note the various branching-off points as we go through them, and then once we're done with the common path I'll circle back and look at all of the others.

I'll list each instruction or group of instructions followed by a description of what it does and why. Just remember to look up to find the instruction any given piece of text is discussing.

Each instruction is preceded by its offset from the beginning of the function. This serves as a counter, and lets you identify jump targets.

ARM64 has 31 integer registers which are 64 bits wide. They're referred to with the notation x0 through x30. It's also possible to access the lower 32 bits of each register as if it were a separate register, using w0 through w30. Registers x0 through x7 are used to pass the first eight parameters to a function. That means that objc_msgSend receives the self parameter in x0 and the selector _cmd parameter in x1.

Let's begin!

    0x0000 cmp     x0, #0x0
    0x0004 b.le    0x6c

This performs a signed comparison of self with 0 and jumps elsewhere if the value is less than or equal to zero. A value of zero is nil, so this handles the special case of messages to nil. This also handles tagged pointers. Tagged pointers on ARM64 are indicated by setting the high bit of the pointer. (This is an interesting contrast with x86-64, where it's the low bit.) If the high bit is set, then the value is negative when interpreted as a signed integer. For the common case of self being a normal pointer, the branch is not taken.

    0x0008 ldr    x13, [x0]

This loads self's isa by loading the 64-bit quantity pointed to by x0, which contains self. The x13 register now contains the isa.

    0x000c and    x16, x13, #0xffffffff8

ARM64 can use non-pointer isas. Traditionally the isa points to the object's class, but non-pointer isa takes advantage of spare bits by cramming some other information into the isa as well. This instruction performs a logical AND to mask off all the extra bits, and leaves the actual class pointer in x16.

    0x0010 ldp    x10, x11, [x16, #0x10]

This is my favorite instruction in objc_msgSend. It loads the class's cache information into x10 and x11. The ldp instruction loads two registers' worth of data from memory into the registers named in the first two arguments. The third argument describes where to load the data, in this case at offset 16 from x16, which is the area of the class which holds the cache information. The cache itself looks like this:

    typedef uint32_t mask_t;

    struct cache_t {
        struct bucket_t *_buckets;
        mask_t _mask;
        mask_t _occupied;
    }

Following the ldp instruction, x10 contains the value of _buckets, and x11 contains _occupied in its high 32 bits, and _mask in its low 32 bits.

_occupied specifies how many entries the hash table contains, and plays no role in objc_msgSend. _mask is important: it describes the size of the hash table as a convenient AND-able mask. Its value is always a power of two minus 1, or in binary terms something that looks like 000000001111111 with a variable number of 1s at the end. This value is needed to figure out the lookup index for a selector, and to wrap around the end when searching the table.

    0x0014 and    w12, w1, w11

This instruction computes the starting hash table index for the selector passed in as _cmd. x1 contains _cmd, so w1 contains the bottom 32 bits of _cmd. w11 contains _mask as mentioned above. This instruction ANDs the two together and places the result into w12. The result is the equivalent of computing _cmd % table_size but without the expensive modulo operation.

    0x0018 add    x12, x10, x12, lsl #4

The index is not enough. To start loading data from the table, we need the actual address to load from. This instruction computes that address by adding the table index to the table pointer. It shifts the table index left by 4 bits first, which multiplies it by 16, because each table bucket is 16 bytes. x12 now contains the address of the first bucket to search.

    0x001c ldp    x9, x17, [x12]

Our friend ldp makes another appearance. This time it's loading from the pointer in x12, which points to the bucket to search. Each bucket contains a selector and an IMP. x9 now contains the selector for the current bucket, and x17 contains the IMP.

    0x0020 cmp    x9, x1
    0x0024 b.ne   0x2c

These instructions compare the bucket's selector in x9 with _cmd in x1. If they're not equal then this bucket does not contain an entry for the selector we're looking for, and in that case the second instruction jumps to offset 0x2c, which handles non-matching buckets. If the selectors do match, then we've found the entry we're looking for, and execution continues with the next instruction.

    0x0028 br    x17

This performs an unconditional jump to x17, which contains the IMP loaded from the current bucket. From here, execution will continue in the actual implementation of the target method, and this is the end of objc_msgSend's fast path. All of the argument registers have been left undisturbed, so the target method will receive all passed in arguments just as if it had been called directly.

When everything is cached and all the stars align, this path can execute in less than 3 nanoseconds on modern hardware.

That's the fast path, how about the rest of the code? Let's continue with the code for a non-matching bucket.

    0x002c cbz    x9, __objc_msgSend_uncached

x9 contains the selector loaded from the bucket. This instruction compares it with zero and jumps to __objc_msgSend_uncached if it's zero. A zero selector indicates an empty bucket, and an empty bucket means that the search has failed. The target method isn't in the cache, and it's time to fall back to the C code that performs a more comprehensive lookup. __objc_msgSend_uncached handles that. Otherwise, the bucket doesn't match but isn't empty, and the search continues.

    0x0030 cmp    x12, x10
    0x0034 b.eq   0x40

This instruction compares the current bucket address in x12 with the beginning of the hash table in x10. If they match, it jumps to code that wraps the search back to the end of the hash table. We haven't seen it yet, but the hash table search being performed here actually runs backwards. The search examines decreasing indexes until it hits the beginning of the table, then it starts over at the end. I'm not sure why it works this way rather than the more common approach of increasing addresses that wrap to the beginning, but it's a safe bet that it's because it ends up being faster this way.

Offset 0x40 handles the wraparound case. Otherwise, execution proceeds to the next instruction.

    0x0038 ldp    x9, x17, [x12, #-0x10]!

Another ldp, once again loading a cache bucket. This time, it loads from offset 0x10 to the address of the current cache bucket. The exclamation point at the end of the address reference is an interesting feature. This indicates a register write-back, which means that the register is updated with the newly computed value. In this case, it's effectively doing x12 -= 16 in addition to loading the new bucket, which makes x12 point to that new bucket.

    0x003c b      0x20

Now that the new bucket is loaded, execution can resume with the code that checks to see if the current bucket is a match. This loops back up to the instruction labeled 0x0020 above, and runs through all of that code again with the new values. If it continues to find non-matching buckets, this code will keep running until it finds a match, an empty bucket, or hits the beginning of the table.

    0x0040 add    x12, x12, w11, uxtw #4

This is the target for when the search wraps. x12 contains a pointer to the current bucket, which in this case is also the first bucket. w11 contains the table mask, which is the size of the table. This adds the two together, while also shifting w11 left by 4 bits, multiplying it by 16. The result is that x12 now points to the end of the table, and the search can resume from there.

    0x0044 ldp    x9, x17, [x12]

The now-familiar ldp loads the new bucket into x9 and x17.

    0x0048 cmp    x9, x1
    0x004c b.ne   0x54
    0x0050 br     x17

This code checks to see if the bucket matches and jumps to the bucket's IMP. It's a duplicate of the code at 0x0020 above.

    0x0054 cbz    x9, __objc_msgSend_uncached

Just like before, if the bucket is empty then it's a cache miss and execution proceeds into the comprehensive lookup code implemented in C.

    0x0058 cmp    x12, x10
    0x005c b.eq   0x68

This checks for wraparound again, and jumps to 0x68 if we've hit the beginning of the table a second time. In this case, it jumps into the comprehensive lookup code implemented in C:

    0x0068 b      __objc_msgSend_uncached

This is something that should never actually happen. The table grows as entries are added to it, and it's never 100% full. Hash tables become inefficient when they're too full because collisions become too common.

Why is this here? A comment in the source code explains:

Clone scanning loop to miss instead of hang when cache is corrupt. The slow path may detect any corruption and halt later.

I doubt that this is common, but evidently the folks at Apple have seen memory corruption which caused the cache to be filled with bad entries, and jumping into the C code improves the diagnostics.

The existence of this check should have minimal impact on code that doesn't suffer from this corruption. Without it, the original loop could be reused, which would save a bit of instruction cache space, but the effect is minimal. This wraparound handler is not the common case anyway. It will only be invoked for selectors that get sorted near the beginning of the hash table, and then only if there's a collision and all the prior entries are occupied.

    0x0060 ldp    x9, x17, [x12, #-0x10]!
    0x0064 b      0x48

The remainder of this loop is the same as before. Load the next bucket into x9 and x17, update the bucket pointer in x12, and go back to the top of the loop.

That's the end of the main body of objc_msgSend. What remains are special cases for nil and tagged pointers.

Tagged Pointer Handler
You'll recall that the very first instructions checked for those and jumped to offset 0x6c to handle them. Let's continue from there:

    0x006c b.eq    0xa4

We've arrived here because self is less than or equal to zero. Less than zero indicates a tagged pointer, and zero is nil. The two cases are handled completely differently, so the first thing the code does here is check to see whether self is nil or not. If self is equal to zero then this instruction branches to 0xa4, which is where the nil handler lives. Otherwise, it's a tagged pointer, and execution continues with the next instruction.

Before we move on, let's briefly discuss how tagged pointers work. Tagged pointers support multiple classes. The top four bits of the tagged pointer (on ARM64) indicate which class the "object" is. They are essentially the tagged pointer's isa. Of course, four bits isn't nearly enough to hold a class pointer. Instead, there's a special table which stores the available tagged pointer classes. The class of a tagged pointer "object" is found by looking up the index in that table which corresponds to the top four bits.

This isn't the whole story. Tagged pointers (at least on ARM64) also support extended classes. When the top four bits are all set to 1 the next eight bits are used to index into an extended tagged pointer class table. This allows the runtime to support more tagged pointer classes, at the cost of having less storage for them.

Let's continue.

    0x0070 mov    x10, #-0x1000000000000000

This sets x10 to an integer value with the top four bits set and all other bits set to zero. This will serve as a mask to extract the tag bits from self.

    0x0074 cmp    x0, x10
    0x0078 b.hs   0x90

This checks for an extended tagged pointer. If self is greater than or equal to the value in x10, then that means the top four bits are all set. In that case, branch to 0x90 which will handle extended classes. Otherwise, use the primary tagged pointer table.

    0x007c adrp   x10, _objc_debug_taggedpointer_classes@PAGE
    0x0080 add    x10, x10, _objc_debug_taggedpointer_classes@PAGEOFF

This little song and dance loads the address of _objc_debug_taggedpointer_classes, which is the primary tagged pointer table. ARM64 requires two instructions to load the address of a symbol. This is a standard technique on RISC-like architectures. Pointers on ARM64 are 64 bits wide, and instructions are only 32 bits wide. It's not possible to fit an entire pointer into one instruction.

x86 doesn't suffer from this problem, since it has variable-length instructions. It can just use a 10-byte instruction, where two bytes identify the instruction itself and the target register, and eight bytes hold the pointer value.

On a machine with fixed-length instructions, you load the value in pieces. In this case, only two pieces are needed. The adrp instruction loads the top part of the value, and the add then adds in the bottom part.

    0x0084 lsr    x11, x0, #60

The tagged class index is in the top four bits of x0. To use it as an index, it has to be shifted right by 60 bits so it becomes an integer in the range 0-15. This instruction performs that shift and places the index into x11.

    0x0088 ldr    x16, [x10, x11, lsl #3]

This uses the index in x11 to load the entry from the table that x10 points to. The x16 register now contains the class of this tagged pointer.

    0x008c b      0x10

With the class in x16, we can now branch back to the main code. The code starting with offset 0x10 assumes that the class pointer is loaded into x16 and performs dispatch from there. The tagged pointer handler can therefore just branch back to that code rather than duplicating logic here.

    0x0090 adrp   x10, _objc_debug_taggedpointer_ext_classes@PAGE
    0x0094 add    x10, x10, _objc_debug_taggedpointer_ext_classes@PAGEOFF

The extended tagged class handler looks similar. These two instructions load the pointer to the extended table.

    0x0098 ubfx   x11, x0, #52, #8

This instruction loads the extended class index. It extracts 8 bits starting from bit 52 in self into x11.

    0x009c ldr    x16, [x10, x11, lsl #3]

Just like before, that index is used to look up the class in the table and load it into x16.

    0x00a0 b      0x10

With the class in x16, it can branch back into the main code.

That's nearly everything. All that remains is the nil handler.

nil Handler
Finally we get to the nil handler. Here it is, in its entirety.

    0x00a4 mov    x1, #0x0
    0x00a8 movi   d0, #0000000000000000
    0x00ac movi   d1, #0000000000000000
    0x00b0 movi   d2, #0000000000000000
    0x00b4 movi   d3, #0000000000000000
    0x00b8 ret

The nil handler is completely different from the rest of the code. There's no class lookup or method dispatch. All it does for nil is return 0 to the caller.

This task is a bit complicated by the fact that objc_msgSend doesn't know what kind of return value the caller expects. Is this method returning one integer, or two, or a floating-point value, or nothing at all?

Fortunately, all of the registers used for return values can be safely overwritten even if they're not being used for this particular call's return value. Integer return values are stored in x0 and x1 and floating point return values are stored in vector registers v0 through v3. Multiple registers are used for returning smaller structs.

This code clears x1 and v0 through v3. The d0 through d3 registers refer to the bottom half of the corresponding v registers, and storing into them clears the top half, so the effect of the four movi instructions is to clear those four registers. After doing this, it returns control to the caller.

You might wonder why this code doesn't clear x0. The answer to that is simple: x0 holds self which in this case is nil, so it's already zero! You can save an instruction by not clearing x0 since it already holds the value we want.

What about larger struct returns that don't fit into registers? This requires a little cooperation from the caller. Large struct returns are performed by having the caller allocate enough memory for the return value, and then passing the address of that memory in x8. The function then writes to that memory to return a value. objc_msgSend can't clear this memory, because it doesn't know how big the return value is. To solve this, the compiler generates code which fills the memory with zeroes before calling objc_msgSend.

That's the end of the nil handler, and of objc_msgSend as a whole.

Conclusion
It's always interesting to dive into framework internals. objc_msgSend in particular is a work of art, and delightful to read through.

That's it for today. Come back next time for more squishy goodness. Friday Q&A is driven by reader input, so if you have something you'd like to see discussed here, send it in!

More Advanced Swift Workshop, and Blog and Book Updates

Mike Ash — Tue, 13 Jun 2017 16:11:00 GMT

More Advanced Swift Workshop, and Blog and Book Updates

I'm hoping to resume a regular posting schedule soon, and I wanted to give everybody some updates.

First, I'm holding two more Advanced Swift Workshops next month, one in DC on July 13th and one in New York City on July 24th. Click here for the one in DC, and here for the one in New York City. As with the previous ones, we'll be covering various advanced topics on Swift programming, with yours truly presenting and a small group with lots of opportunity for discussion and experimentation.

Second, The Complete Friday Q&A Volume II is nearly ready. Not quite there yet, but the text and layout are done and it's down to some final tweaking and doing the work of actually getting it out there. Stay tuned.

Last, I hope to resume regular posts in the next couple of weeks. There were some interesting things from WWDC that I want to write about, plus my usual routine of crazy topics. In particular, I did a thorough analysis of the latest implementation of objc_msgSend for ARM64 for a talk that I'd like to write up, and I want to write something about Swift's new Codable stuff. As always, I'm driven by reader suggestions, so if you have something from WWDC you'd like to see, or something unrelated you think would be cool, send it in!

Advanced Swift Workshop in New York City

Mike Ash — Mon, 27 Mar 2017 15:29:00 GMT

Advanced Swift Workshop in New York City

I will be holding another one-day workshop on advanced Swift programming in New York City on May 4th. This will be much the same as my previous one in Washington in December, in a new location and with various tweaks and improvements. If you enjoy my articles and want to sharpen your Swift skills, check it out.

I'll discuss the ins and outs of ARC and memory management, reference cycles, enums, generics, designing code to take advantage of enums and generics, pointer APIs, and interfacing with C APIs. Attendees will receive a bunch of Xcode playgrounds that illustrate everything we discuss, as well as the presentation slides.

The format will be part lecture, part exercises using the playgrounds, with plenty of opportunity for discussions and personalized help.

For more information, or to buy tickets, visit the event page. If you know anyone who might like to come, please pass the word!

Another book update, since my last workshop post included one: I've completed my final read-through and am in the middle of fixing the problems that uncovered. Then it'll be ready to go! I don't have a date for it, but it's getting close.

Advanced Swift Workshop in Washington, DC

Mike Ash — Sat, 12 Nov 2016 21:27:00 GMT

Advanced Swift Workshop in Washington, DC

I will be holding a one-day workshop on advanced Swift programming in the Washington, DC area on December 12th. If you enjoy my articles and want to sharpen your Swift skills, check it out.

I'm going to be discussing the ins and outs of ARC and memory management, reference cycles, enums, generics, designing code to take advantage of enums and generics, pointer APIs, and interfacing with C APIs. I have been building out a set of nifty Xcode playgrounds to illustrate everything, and attendees will receive a copy of them, as well as the presentation slides.

The format will be part lecture, part exercises using the playgrounds, with plenty of opportunity for discussions and personalized help.

If you think you might like to come, take a look at the event on Eventbrite. And if you know others who might like to come, please tell them about it!

In unrelated news, since I'm sure some of you are wondering, Volume II of my book is coming along slowly but surely, and I hope to get it out the door and get back to writing articles before too much longer. Stay tuned!

Good News, Bad News, and Ugly News

Mike Ash — Wed, 01 Jun 2016 11:48:00 GMT

Good News, Bad News, and Ugly News

The good news is that I'm officially restarting work on The Complete Friday Q&A: Volume II. I got partway into it a while ago and ran out of steam. The restarted edition includes all posts made since then, making it pretty massive. I can't commit to a specific timeframe, but I hope that it will be a few months at most before I have it out. There may be opportunities for reader involvement in checking and polishing it, so watch this space.

The bad news is that I can't keep up with new blog posts at the same time. It just hasn't worked out. That's part of why I've been quiet lately, and so I'm suspending regular posts for the duration. I may make occasional irregular posts in the meantime (I'm sure WWDC will have something worth discussing) and I'll resume a more regular schedule once the book is done.

The ugly news is... there is no ugly news, that was just a pointless movie reference. Don't panic!