mikeash.com pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html commentshttp://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsmikeash.com Recent CommentsFri, 29 Mar 2024 12:03:27 GMTPyRSS2Gen-1.0.0http://blogs.law.harvard.edu/tech/rss192.168.1.1 - 2017-06-22 15:31:15http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsThanks for informative post, i added the page to bookmarks and i'll come back here later. <br />106aadd1a7c854a1f5fc646362d424bdThu, 22 Jun 2017 15:31:15 GMTAlex - 2017-01-12 05:47:56http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsI'm actually surprised how slow NSView creation is. It's 30x slower than a plain memory allocation, and less than 2x faster than disk I/O. I wonder what it's doing in there. <br /> <br />I look forward to the Swift additions to this.d4e6307cf1dadb67ce69ba35d7156c48Thu, 12 Jan 2017 05:47:56 GMTmikeash - 2017-01-08 03:44:37http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#comments<b>John Wallace:</b> The message cache is not a MRU cache, it's a persistent cache of <i>all</i> messages ever sent. This cache is cleared on certain occasions, such as runtime manipulation of classes that would invalidate it, or loading new binary images, but in most programs the cache persists for a long time. Hitting the cache is the common case, by far. Objective-C would be intolerably slow if it were not.f102d8c9273e447da3e7f546293bfa08Sun, 08 Jan 2017 03:44:37 GMTJohn Wallace - 2016-12-08 20:44:02http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsIs the Objective-C message sent in a tight loop? The reason I ask is that there is a MRU cache on Obj-C messages that significantly speeds up repeated calls to a method. Base on your numbers, I'm assuming your test code is hitting that cache. Missing that cache, which is the most common real-world usage pattern, would be much slower because of how it walks the method tables to find a method. If you ever update your tests, it would be interesting to add that test case.8c7fe3116738bce08945af49d4535e45Thu, 08 Dec 2016 20:44:02 GMTafrica - 2016-11-30 23:39:30http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsOn a more general note, since iPhone CPUs are now closing in on 2GHz and multi-core, the real performance differences we’ll see will be about I/O. Traditionally, people think of I/O as disk, and maybe GPU, but people need to remember that main system RAM is also I/O. The current problem with computing today is that the majority of the time, the CPU is sitting around idle waiting on memory or something else. 1ebc4b5d54cbff36a467836be72bbc54Wed, 30 Nov 2016 23:39:30 GMTTZ - 2016-06-22 12:12:47http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsHi Mike, I tried running your benchmarks on my machine but I can't build them in the release mode - clang crashes with a setfault. Did you per chance have experienced a similar problem and might know how to fix it? Thankscc27b07e50d956ab1a884b77918835a2Wed, 22 Jun 2016 12:12:47 GMTEric Wing - 2016-04-28 05:21:05http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsRe: Floating-point division vs. integer division <br /> <br />I'm not a CPU expert, so I would like to learn more from those who do know, but there are a few factors. <br /> <br />First, I have been told that while the algorithms for division in both float and integer are complex, because the floating point is split between sign/mantissa/exponent, these operations can actually be split to be done in parallel (in the underlying circuitry). Integer division cannot be split this way so it is a sequential algorithm, and also working on a larger number of bits since it is not split among sign/mantissa/exponent. <br /> <br />Second, integer division is not a common operation whereas float division is usually more useful. So there may be fewer integer divider units on a processor. Whereas you may get several floating point dividers (different ports per core), and this is not counting that each of these is usually SIMD/vectorized so you are expected to do (4 | 8 | 16 | etc) in the same operation. I suspect this compile level will not try to vectorize for SIMD, so we can throw out that difference. But particularly with out-of-order/reorder execution CPUs like Intel and I think the latest Apple chips, because there are multiple floating point divider ports, your pipeline is less likely to stall waiting for a free unit. <br /> <br /> <br />On a more general note, since iPhone CPUs are now closing in on 2GHz and multi-core, the real performance differences we’ll see will be about I/O. Traditionally, people think of I/O as disk, and maybe GPU, but people need to remember that main system RAM is also I/O. The current problem with computing today is that the majority of the time, the CPU is sitting around idle waiting on memory or something else. <br /> <br />In real high performance situations, cache hits/misses usually make the biggest differences in performance. Assuming a well written/optimized program that understands things like this, I suspect this is where Mac/desktop will show its huge performance wins as they can sport bigger caches and faster buses. But the kind of benchmark done here won’t make those things show up. This is also the type of thing the compiler optimization flags can’t magically fix either. <br /> <br />Still the conclusion is correct that the iPhone CPU has considerably closed the gap and looks more similar than dissimilar to its desktop counterpart. <br />fc1582e45192501d367150fda2286b4fThu, 28 Apr 2016 05:21:05 GMTJens Ayton - 2016-04-23 08:21:43http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsI'm intrigued by the integer division being 2.6 times <i>slower</i> on this Mac than your old one.e42d0b07e1152ad509c0606d4a0d2c45Sat, 23 Apr 2016 08:21:43 GMTMANIAK_dobrii - 2016-04-22 13:08:04http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsAs always, great article. <br /> <br />I wonder, why "Floating-point division with integer conversion" (double/int) is faster than "Integer division" (int/int)? Can this somehow be related to ARM64 instruction set? <br />9d474b22fcc20b22b6f5252734981577Fri, 22 Apr 2016 13:08:04 GMTRobin Kunde - 2016-04-22 02:23:53http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsThanks for putting this together! <br /> <br />The transformation of memcpy into a series of mov instructions despite -O0 happens through a feature in clang/llvm called intrinsic functions. Basically, the compiler can provide its own implementation for certain basic functions and this happens separately from and transparently to the optimizer. You can disable this behavior with -fno-builtin (or set "Recognize Built-in functions" to No in Xcode build settings). <br /> <br />In my test, it changed the speed of the 16byte memcpy from 0.5ns to 2.7ns.1f1b245a8db1c997b6e39460c4cfd06eFri, 22 Apr 2016 02:23:53 GMTFernando - 2016-04-16 09:40:39http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsTypo: "zero-zecond"c13a12febdf8df35f95375adb66c7023Sat, 16 Apr 2016 09:40:39 GMTCharles Parnot - 2016-04-15 20:42:25http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsNice work, thanks a lot for these insights! <br /> <br />The NSView results really make it clear why NSCell should be on its way out, and is now deprecated for NSTableView.b4365ffee54f3441ea4753555c8a9f29Fri, 15 Apr 2016 20:42:25 GMTMatt - 2016-04-15 17:45:38http://www.mikeash.com/?page=pyblog/friday-qa-2016-04-15-performance-comparisons-of-common-operations-2016-edition.html#commentsHey Mike, long time fan/reader here <br /> <br />Quick question, could you also put up the performance of accessing an instance variable directly? There are currently other sources out there that compare the local variable access vs objc_msgsend but they're kind of old and I'm curious to see what you end up with <br /> <br />I'm also aware it's possible that I'm misunderstanding something and that's something you can't measure66e8805307e8fabc324bba6b38fe11bfFri, 15 Apr 2016 17:45:38 GMT