Next article: Friday Q&A 2014-08-15: Swift Name Mangling
Previous article: Friday Q&A 2014-07-18: Exploring Swift Memory Layout
Tags: fridayqna swift
Continuing the theme from the previous article, I'm going to continue exploring implementation details of Swift's memory layout. Today, I'm going to look at the memory layout of generics, optionals, and protocol objects.
Reminder: Subject to Change
Just as before, this is all internal details from a pre-release version of the language on a specific CPU/OS combination and may change at any time. Don't write code that relies on it. Specifically, these dumps are from x86-64 code running on 10.9. This is a little-endian architecture (as is everything Apple anymore) so all the numbers will be backwards.
Generics
Generics are surprisingly simple. Everything just gets laid out and shifted around as necessary. Let's start with a simple generic struct
:
struct WrapperStruct<T> {
let value: T
}
Let's dump two versions of it:
WrapperStruct(value: 42)
WrapperStruct(value: (42, 43))
The result is:
2a00000000000000
2a00000000000000 2b00000000000000
Everything is laid out contiguously with no padding and no waste. Nice.
How about multiple generic values?
struct WrapperStruct2<T, U> {
let value1: T
let value2: U
}
WrapperStruct2(value1: 42, value2: 43)
WrapperStruct2(value1: (42, 43), value2: 44)
WrapperStruct2(value1: 42, value2: (43, 44))
No surprises in the memory dump:
2a00000000000000 2b00000000000000
2a00000000000000 2b00000000000000 2c00000000000000
2a00000000000000 2b00000000000000 2c00000000000000
In particular, note how the last two are identical. Although the compile-time types are different, that isn't evident in the memory dump. The type difference is all in how the values are accessed.
How about classes? You might expect this to be more complicated since classes have dynamic dispatch and are allocated on the heap. They're not:
class WrapperClass<T> {
let value: T
init(_ value: T) {
self.value = value
}
}
dumpmem(WrapperClass(42))
dumpmem(WrapperClass((42, 43)))
The result:
c8033074807f0000 0800000001000000 2a00000000000000
28093074807f0000 0800000001000000 2a00000000000000 2b00000000000000
We see the 16-byte object header discussed in the previous post, followed by the data. Note that the isa
pointer of the two objects is not identical, even though both are an instance of WrapperClass
. Evidently, a specialization of a generic class is a separate class at runtime.
It's also interesting to note that instances of generic classes can have different sizes. This is a natural consequence of the straightforward implementation of generics, but it's striking coming from Objective-C, where instances of any given class are always the same size, absent certain runtime shenanigans.
Optionals
Let's start off with an easy one: optional object pointers.
A quick refresher: in Swift, unlike Objective-C, a straight object reference cannot be nil
:
let ptr: NSObject
This variable is guaranteed (as far as a compiler can guarantee things) to always point to an object. If we want to allow nil
, we have to explicitly declare an optional by specifying the type as NSObject?
or NSObject!
.
Plain object pointers just contain the address of the object they point to, just like Objective-C. When Objective-C APIs are bridged to Swift, the object pointers get translated to optionals, so we'd expect that, for object pointers, Swift optionals are represented just like Objective-C: a plain address when pointing to an object, and all zeroes for nil
. Let's see:
let obj: NSObject? = NSObject()
let nilobj: NSObject? = nil
let explicitobj: NSObject! = NSObject()
let explicitnilobj: NSObject! = nil
These variables contain:
806040c1957f0000
0000000000000000
70e440c1957f0000
0000000000000000
It's just as we expected. They either contain an object address or all zeroes. Further, explicit and implicit optionals contain the same sort of stuff, which makes sense as explicit/implicit is just syntactic sugar.
Let's move on to integers. Unlike pointers, there's no underlying machine value that doesn't correspond to a valid integer, so something else must be at work. Let's see what we get:
let x = 42
let y: Int? = 42
let z: Int? = nil
This produces:
2a00000000000000
2a00000000000000 00
0000000000000000 01
A plain Int
stores the raw value in eight bytes of memory, as expected. The optional type adds an extra byte at the end to signal whether or not a value is present. If that trailing byte is zero, a value is present. If that trailing byte is 1
, the optional contains nil
. Thus, optional Int
s take up nine bytes rather than eight.
How about struct
s? If you guessed they'd be the same, you'd be right:
let rect: NSRect? = NSMakeRect(1, 2, 3, 4)
let nilrect: NSRect? = nil
000000000000f03f 0000000000000040 0000000000000840 0000000000001040 00
0000000000000000 0000000000000000 0000000000000000 0000000000000000 01
We can see that optionals for value types are implemented by adding an extra byte to the end, where zero means the optional contains a value, and one means the optional is nil
. Reference types are implemented by storing the address when the optional contains a value, and storing zero when the optional contains nil
.
Protocols
Protocols are pretty straightforward for the most part. Any conforming class or struct
implements the methods or properties specified by the protocol, and you can call those. However, there's one interesting aspect of protocols, namely that it's possible to declare a variable that's a protocol type. This is nothing special in Objective-C, since it's just another kind of static type on top of an object pointer. Swift is different, because protocols can apply to struct
s as well as classes. A struct
is a value type, but a class is a reference type, so how can you combine them?
Let's start with a simple example protocol:
protocol P {
func p()
}
Let's make NSObject
conform to P
in an extension:
extension NSObject: P {
func p() {}
}
Then let's stuff an NSObject
instance into a variable of type P
:
let pobj: P = NSObject()
Here's what pobj
contains:
4000201b9c7f0000 2200000000000080 0000050701000000 40e3401a9c7f0000 a8ebb90601000000
The first chunk is the object pointer. 0x00007f9c1b200040
(remember, it's backwards) points to an NSObject
instance in memory. The next two chunks are just garbage, used for padding here. The last 16 bytes contain two pointers to metadata tables. The first one, at offset 24, contains a pointer to the "direct type metadata" for the underlying type. This in turn contains pointers to structures containing things like the type name and fields. The last one, at offset 32 is a "protocol witness table" for the underlying type and the protocol, which contains pointers to the type's implementations of the protocol methods. This is how the compiler is able to invoke methods, such as p()
, on a value of protocol type without knowing the underlying type at runtime.
Let's check out a simple struct
. In this case, we'll start with Int
, which is ultimately a struct
in Swift:
extension Int: P {
func p() {}
}
let pint: P = 42
The dump is much the same:
2a00000000000000 0a00000000000080 c0014a0201000000 98912d0201000000 d00b050201000000
It contains the underlying value, some garbage-filled padding, then the two metadata pointers at the end.
Let's try a bigger struct. It starts to get interesting:
struct S: P {
let x: Int
let y: Int
func p() {}
}
let s: P = S(x: 42, y: 43)
This produces:
2a00000000000000 2b00000000000000 e001540201000000 2815050201000000 d80b050201000000
It looks like that padding is available for use if the underlying value needs the storage. Is the same true of the third chunk?
struct T: P {
let x: Int
let y: Int
let z: Int
func p() {}
}
let t: P = T(x: 42, y: 43, z: 44)
2a00000000000000 2b00000000000000 2c00000000000000 38c61d0401000000 e0bb1d0401000000
Indeed so. What happens if the struct
requires more storage than that?
struct U: P {
let a: Int
let b: Int
let c: Int
let d: Int
func p() {}
}
let u: P = U(a: 42, b: 43, c: 44, d: 45)
Dumping the memory of u
produces:
801f600401000000 1500000000000080 10046c0401000000 58c71d0401000000 e8bb1d0401000000
There's nothing recognizable anymore. None of the struct
elements are present. The first chunk is a pointer to malloc
memory, and dumping that produces:
2a00000000000000 2b00000000000000 2c00000000000000 2d00000000000000
There's the struct
storage. It seems that protocol values provide 24 bytes of storage. If the underlying value fits within 24 bytes, it's stored inline, otherwise it's automatically spilled to the heap.
Conclusion
Swift memory layout for generics is completely straightforward. The values are laid out just as they are with non-generic types, even if it means that class instances change size. Optionals of reference types represent nil
as all zeroes, just like we're used to in Objective-C. Optionals of value types append a byte to the end of the value to indicate whether or not a value is present. Protocol types take up 40 bytes of storage, with the last two pointers containing references to type metadata tables, and the rest available for storying the underlying value. If more than 24 bytes of storage is needed, the value is automatically placed on the heap, and a pointer to the allocation is stored instead.
That's it for today. Come back next time for more exciting Swift adventures!
Comments:
let justnil: NSObject? = nil
let somenil: NSObject?? = .Some(nil)
let somenil2: NSObject??? = .Some(nil)
let somesomenil: NSObject??? = .Some(.Some(nil))
let somenil3: NSObject???? = .Some(nil)
let somesomenil2: NSObject???? = .Some(.Some(nil))
let somesomesomenil: NSObject???? = .Some(.Some(.Some(nil)))
let somesomesomesome: NSObject???? = NSObject()
The results of dumping them, in order:
0000000000000000
0000000000000000
0200000000000000
0000000000000000
0400000000000000
0200000000000000
0000000000000000
e001d032b37f0000
That last one is, of course, the address of the
NSObject
instance.
For enums:
enum Enum { case a; case b; case c; }
let e: Enum? = nil
This produces
03
, so apparently it's just picking the number past the end of the last enum element.
Marc P: As far as I can tell, marking a class with
@objc
just makes it subclass NSObject
instead of SwiftObject
, and the rest stays the same.https://developer.apple.com/library/ios/documentation/CoreFoundation/Conceptual/CFMemoryMgmt/Concepts/ByteOrdering.html
http://people.cs.umass.edu/~verts/cs32/endian.html
I see that little endian byte ordering is used in OSX on Intel chips. I am not sure about iOS on ARM chips.
Could you please share the code of
dumpmem()
function? It will be nice to know that as well.One place where it's a bit up in the air is ARM, since it can run either way. I'd guess that Apple runs ARM in little-endian mode just to keep commonality with Intel CPUs. Little-endian ARM seems to be the common use for ARM, so it may have just been a matter of doing what's common.
Historically, little-endian made life a bit easier for the hardware guys. Arithmetic operations need to start with the least significant byte, and little-endian means that the data is stored in memory in the same order it's used. This is no longer relevant at all, but historical reasons can persist forever.
It's all ancient history now that we no longer have 8-bit buses, but as Mike says historical reasons can persist forever.
[1] The 6501 was even pin compatible and had to be pulled after a lawsuit, leaving the 6502 which differs only in pinout.
[2] Accessing the memory at an address offset by the value of an index register, A = (M + X).
(P) $R0 = {
payload_data_0 = 0x0000000100500140 -> 0x00007fff7dfaf810 (void *)0x00007fff7dfaf838: OBJC_METACLASS_$_NSObject
payload_data_1 = 0x0000000000000000
payload_data_2 = 0x0000000000000000
instance_type = 0x0000000102800020
protocol_witness_0 = 0x0000000100005228 foo`protocol witness table for ObjectiveC.NSObject : foo.P
}
Interesting to look at how protocol witness functions are used to do dynamic dispatch when protocols are used as types – maybe for a future blog post? :)
Luke Howard: I did not know that. Nice info. Glad it agrees with what I found. As for the witness dispatch, it looks like it's just a vtable filled with all the protocol methods.
protocol<A,B,C,...>
construct. Note that Any
is defined as protocol<>
. It seems the protocol object contains one witness table for each protocol listed in this combined protocol type:
(Any) $R0 = {
payload_data_0 = 0x0000000000000003
payload_data_1 = 0x2818c53e00007fff
payload_data_2 = 0x00000001001e0cc0 direct type metadata for Swift.VaListBuilder + 16
instance_type = 0x00000001001cef88 libswiftCore.dylib`direct type metadata for Swift.Int + 8
}
(protocol<Printable, Reflectable>) $R1 = {
payload_data_0 = 0x0000000000000003
payload_data_1 = 0x0000000000000000
payload_data_2 = 0x0000000000000000
instance_type = 0x00000001001cef88 libswiftCore.dylib`direct type metadata for Swift.Int + 8
protocol_witness_0 = 0x00000001001c70e8 libswiftCore.dylib`protocol witness table for Swift.Int : Swift.Printable
protocol_witness_1 = 0x00000001001c7df0 libswiftCore.dylib`protocol witness table for Swift.Int : Swift.Reflectable
}
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.