I was working on a mini project at Square involving performance unit testing for iOS — essentially, looking into how we could introduce performance unit testing, what our options are, and how it would scale on our CI (continuous integration). In focusing on the one tool Apple provides as part of its unit testing suite, the magical
measureBlock method, the question was: how does it work? And, is this going to work for us and our CI process?
For those who don’t know
measureBlock,a little background: When you write a unit test in
XCTest, a feature allows you to measure how long a block of code takes to execute. It looks like this in Objective-C:
Or in Swift:
Xcode runs this a bunch of times and establishes a baseline. If, on a given subsequent test, the standard deviation is too far off from that baseline, then the test fails. Additionally, Xcode provides you with a nifty popover showing you the durations of various runs and lets you pick your own baseline and settings:
How Does It Work?
Unfortunately, Google results were pretty shallow, until I stumbled upon some old slides from a 2014 WWDC session.
This document explains that
measureBlock runs your block 10 times and calculates the average time it takes to run your block. This average is then used as a baseline. The very first time you run your test, it will fail, because no baseline has been established yet as it gets calculated on the first run. You may modify that baseline manually. On subsequent test runs,
measureBlock still runs your block 10 times but this time it will compare the standard deviation for the run time to the baseline. If it is more than 10% off either up or down, then your test will fail. All these settings can be manually changed, too.
Baseline vs. Average
Xcode shows a little popover that displays both a baseline and an average. The difference between the two is: the average is the time it took to run your block of code the last time you ran your test. The baseline is a fixed setting of your choosing (automatically set by Xcode if you don’t do it). The standard deviation is compared to the baseline; the average displayed in the popover does not have any effect on your test.
Why Use Standard Deviation
Here is a graph that shows the run times for 10 runs of a given code block:
The average time is 1 second. (from Apple WWDC slides)
Now, here’s a second graph where the average time is also 1 second:
Clearly, the average doesn’t tell the whole story. This is why
measureBlockcompares the standard deviation to the baseline — because the standard deviation tells us about the spread of the measurements.
Where Is The Baseline Stored?
So now the big question for those who work at big companies running CI on multiple machines: where does the baseline get stored? I figured this out simply by using git, adding a performance unit test, and looking at the file diff.
Xcode stores your baselines within the project file package, under
project.xcodeproj/xcshareddata/xcbaselines/.... This folder will contain one
.plist listing all performance test settings for a given host machine+run target combination, and a
Info.plist with a list of all host machines. Baselines are specific to both the host machine running the tests and the targeted device (eg. iPhone 7 simulator). Xcode generates a unique UUID to identify the combo (machine+target) and ties all the performance settings to it. The combo is defined by the specs of the machine — so if you run your performance test on a different machine that has the exact same specs, then that same baseline will be pulled (see screenshot below for what specs are used to define the combo).
Info.plist that indexes all host machine and target combinations looks like this:
And this is an example of a
.plist for a given host machine’s performance test settings:
So while these get checked into your code repository, every machine will have to have its own settings. This is reasonable, given performance will vary from a host machine to another, and from a simulator to another. However, it may get tricky if you have hundreds of virtualized machines at a large company.
- What version of Xcode is this?
- How did you figure out the plists?
I made changes to the baselines using Xcode and used git to detect file changes. Then I opened the plists, tried to modify them by hand, and looked at the results. I also tried to run my test on another machine to see which baselines get pulled.
- Can I generate plists with a script?
Yes (as long as Apple doesn’t change things). You can generate a random UUID to name a combo and plug in all specs you want. Just make sure not to forget any fields. The
Info.plistfile needs to contain a reference to the combo (machine+target) and contains the specs. You also need a file named after the UUID with a
.plistextension that contains all the test names and associated baselines.
- Can I remove fields from the plist to make it more general and reuse a “combo” for multiple types of machines or targets?
No. If a combo is not a perfect match with a machine’s specs then Xcode will generate a new UUID and fill it with the machine’s exact specs. This will requires you adding new baselines to tie to this UUID.
Key determinations about
measureBlockruns your code block 10 times.
- Compares standard deviation to a baseline.
- Baseline is computed by Xcode but can be set by hand.
- Baselines for your tests are stored in the
.xcodeprojfile but are both host machine and target device specific.
Essentially, Apple has provided iOS developers with a great, simple performance unit testing tool. However, without a ton of extra tooling and scripting, it may not scale for companies that run automated tests on hundreds of machines of varying specs.