You can only change what you can measure

Photo by Laineys Repertoire

With the release of Ember 2.14 and 2.15 Beta release, the Ember.js team unveiled a new approach to application benchmarking called Ember Macro Benchmarking. Ember Macro Benchmarking (EMB) is designed to mitigate the major pitfalls of micro benchmarking. EMB benchmarks the application layer from outside the system while minimizing the variable performance impact of network requests by stubbing them.

In this post, we’ll talk to the Ember.js team about how they used Ember Macro Benchmark visualizations to measure the performance impact of proposed changes, and how those visualizations helped them to determine which changes to accept or reject.

I spoke to Kris Selden (creator of EMB) about why he created Ember Macro Benchmark.

He and LinkedIn needed a solution that facilitated accurate measurements that were easily reproducible, that could be peer reviewed, and that had low variance so you can detect small changes. He needed a solution that used the network, but was local and controlled, and could still be throttled.

In this post we’ll discuss the solution to those needs, Ember Macro Benchmark, and show you how to utilize it for your own applications. So that you can begin to rigorously measure your application’s performance through even the most minute changes. And so that you can make changes to your codebase with confidence that they will improve performance for your users.

A brief introduction to macro benchmarking

Here’s a definition of the term “benchmark” that we’ll be using.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it — Benchmark (computing) — Wikipedia

It is typical, when thinking of benchmarking, to think of microbenchmarking. Microbenchmarking is writing a small program that executes and measures only the portion of code you are concerned about, typically only a single function. This can be useful in many applications. However, there are some problems with this approach.

The most immediate problem is that, by their very nature, microbenchmarks encourage focus on an exceedingly small part of the system. Often this will lead to sacrificing overall system performance for wins at the micro scale.

In his wonderful blog post on the subject, microbenchmarks fairy tale, Vyacheslav Egorov explains:

As compilers get smarter it becomes harder to write a microbenchmark that would give you an answer to even the simplest question. As Robert Sheckley wrote in his short story Ask a Foolish Question : In order to ask a question you must already know most of the answer

Don’t get me wrong, I think microbenchmarking can be immensely valuable. In fact I think the speed of JavaScript itself is owed in no small part to microbenchmarking tools and suites like Sunspider, Octane, and Kraken. But successful microbenchmarking requires the understanding that microbenchmarks themselves are imperfect. For a much more comprehensive description of the perils of microbenchmarking please take a moment to read Adventures in Microbenchmarking.

Ember Macro Benchmark

Despite the name, there is nothing about Ember Macro Benchmark that limits its use to Ember applications. In a future article I will show how to use it with non-Ember applications.

Ember Macro Benchmark leverages two other libraries (also by Kris): Chrome Tracing, and HAR Remix.

HAR Remix

What is a HAR file?

The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser’s interaction with a site. The common extension for these files is .har — .har — Wikipedia

The HAR file contains detailed information about every network request that can be used to recreate that request. This includes storing headers, response-timings, and the content.

The goal of the HAR Remix library is to provide a way to serve entries from this .har file with loose matching and the ability to modify requests. We’ll eventually create a .har file that is the recording of responses of the site we’re going to macro-benchmark. When benchmarking, we’ll play back that recording. This allows us to finely control the network layer or to remove it almost entirely from the list of variables we need be concerned about.

Chrome Tracing

Chrome Tracing is a JavaScript variant of Telemetry, a Chromium project written mostly in Python. Both are ways to automatically run chrome benchmarking tools and play back recordings. In Chrome Tracing’s case it uses a .har file previously mentioned.

Measuring Initial Render with Ember Macro Benchmark

Alright, so let’s see what that looks like all put together straight out of the EMB repository.

Let’s follow the instructions there, but I’ll add a few extra notes.

Reporting

Ember Macro Benchmark provides out of the box reporting tools. The primary graphs generated by this tooling are all about identifying distributions through multiple result sets. The final output is a Wilcoxon Rank Sum test which gives you the probability that your change had no effect. The final output has more details that will help you interpret the results.

Ember Macro Benchmark uses the statistical computing language R. First, you’ll need to install R with the following command:

Once R is installed you’ll need to install some of the packages used in the reporting R scripts included in EMB. You can do so by running the R REPL and executing the following commands:

Then from within your Ember Macro Benchmark clone you need to `yarn install`.

This will install Chrome Tracing, HAR Remix, and other dependencies used by Ember Macro Benchmark.

Configuring the server

This part requires a bit of a peek into the README to figure out the different options. We’ll be using the default one provided by the repository as it already includes its .har file and associated config to test emberaddons.com.

When I clone EMB I create a branch with my config.json modifications, doing so makes pulling in upstream changes to EMB itself much easier. I tend to keep .har files and associated build assets in a separate repository. Keeping these assets separated is especially helpful if you need to add custom instrumentation. We’ll talk about instrumentation later.

Once you have everything configured you can run `yarn run serve` to start the HAR Remix server. And, in a separate terminal window, `yarn run benchmark` to kick off the Chrome Tracing automated benchmarking.

It is extremely important to quiet your system. Otherwise the results will be very noisy. CPU usage and IO unrelated to your benchmark can drastically affect measurements. Remember, this is a macro benchmark and as such is affected by the same things that any application is subject to. Minimize external network requests, as well as anything else you can think of that might engage your CPU is important to increasing the fidelity of your results.

Take these steps to ensure consistent results:

  1. Restart your computer
  2. Close all applications that aren’t essential for Ember Macro Benchmark
  3. Turn off your network connection
  4. If on a laptop ensure you are plugged into power

Alright, now run `yarn run benchmark`. Be sure not to interact with your system while the test runs.

This will generate a results.json file in your results folder which is used by the reporting tools to generate your reports.

Upon completion run:

yarn run plot

This will generate a report for you in results/results.pdf with a graph that looks like this:

These graphs are pretty cool. Remember you can only change what you can measure, and if it isn’t measured it’s just moving code around.

Instrumenting an App for Ember Macro Benchmark

Alright, you’ve seen how the default Ember Macro Benchmark configuration works, but how does it work?

Chrome Tracing requires the page to call `performance.mark` in specific places to generate its measurements.

One of the “markers” CT uses is `domLoading` which you don’t need to instrument yourself, as the browser will do this at a known time. There are only 3 that we ourselves need to implement to generate the EMB report.

First is `beforeVendor` which as you might imagine goes before the `vendor.js` script inside your Ember application.

It looks like this:

The second and third are implemented in your Router’s `didTransition` and `willTransition` hooks.

You can add these marks with following code:

In addition to the peformance markers, Chrome Tracing requires that the brower navigate to the URL “about:blank” after a run. Since it would be awkward our apps redirected users to a blank page normally, Chrome Tracing will add a query parameter of “?perf.tracing” to any benchmarked URL.

To do that we need to add the following function to the leaf-most node of our application in an `afterModel` hook:

This redirects when the application has finally painted.

Phew, okay. That was a lot. Now what?

Conclusion

There is currently a Quest Issue open on Ember Macro Benchmark’s repo to make setting up your very own macro benchmarking suite easier. If you’d like to talk more about it be sure to comment and discuss there.

Like what you read? Give Jonathan Jackson a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.