Understanding API Latencies

Today’s Internet is powered by APIs. Tomorrow’s will be even more so. Without a pretty UI or a captivating experience, you’re judged simply on performance and availability. As an API provider, it is more critical than ever to understand how your system is performing.

With the emergence of micro services, we have an API layered cake. And often that layered cake looks like one from a Dr. Seuss story. That complex systems fail in complex ways is a deep and painful truth that developers are facing now in even the most ordinary of applications. So, as we build these decoupled, often asynchronous, systems that compose a single user transaction from often tens of underlying networked subtransactions we’re left with a puzzle. How is the performance changing as volume increases usage and, often more importantly, how is it changing as we rapidly deploy micro updates to our micro services?

Developers have long known that they must be aware of their code performance and, at least in my experience, developers tend to be fairly good about minding their performance P’s and Q’s. However, in complex systems, the deployment environment and other production environmental conditions have tremendous influence on the actual performance delivered. The cry, “but it worked in dev” has moved from the functionality to the performance realm of software. I tell you now that I can sympathize.

It has always been a challenge to take a bug in functionality observed in production and build a repeatable test case in development to diagnose, address, and test for future regression. This challenge has been met by the best developers out there. The emergent conditions in complex, decoupled production system are nigh impossible to replicate in a development environment. This leaves developers fantastically frustrated and requires a different tack: production instrumentation.

As I see it, there are two approaches to production instrumentation that are critically important (there would be one approach if storage and retrieval were free and observation had no effect — alas we live in the real world and must compromise). You can either sacrifice coverage for depth or sacrifice depth for coverage. What am I talking about?

I’d love to be able to pick apart a single request coming into my service in excruciating detail. Watch it arrive, calculate the cycles spent on CPU, the time off, which instruction and stack took me off CPU, the activity that requested information from another microservice, the perceived latency between systems, all of the same things on the remote micro service, the disk accesses and latency on delivery for my query against Cassandra, and the details of the read-repair it induced. This list might seem long, but I could go on for pages. The amount of low-level work that is performed to serve even the simplest of requests is staggering… and every single step is subject to bugs, poor interactions, performance regressions and other generally bad behavior. The Google Dapper paper and the OpenZipkin project take a stab at delivering on this type of visibility, and now companies like Lightstep are attempting to deliver on this commercially. I’m excited! This type of tooling is one of two critical approaches to production system visibility.


The idea of storing this information on every single request that arrives is absurd today, but even when it is no longer absurd tomorrow, broad and insightful reporting on it will remain a challenge. Hence the need for the second approach.

You guessed it, Circonus falls squarely into the second approach: coverage over depth. You may choose not to agree with my terminology, but hopefully the point will come across. In this approach, instead of looking at individual transactions into the system (acknowledging that we cannot feasibly record and report all of them), we look at the individual components of the system and measure everything. That API we’re serving? Measure the latency of every single request on every exposed endpoint. The micro service you talked to? Measure the latency there. The network protocol over which you communicated? Measure the size of every single package sent in each direction. That Cassandra cluster? Measure the client-facing latency, but also measure the I/O latency of every single disk operation on each spindle (or EBS volume, or ephemeral SSD) on each node. It sounds like a lot of data, sure. We live in the future, and analytics systems are capable of handling a billion measurements per second these days, all the while remaining economical.


The above graph shows the full distribution of every IO operation on one of our core database nodes. The histogram in the breakout box shows three distinct modes (two tightly coupled in the left peak and one smaller mode further out in the latency spectrum. We can also see a radical divergence in behavior immediately following Feb 14th at 9am. As we’re looking at one week of data, each time slice vertically is 1h30m. The slice highlighted by the vertical grey hairline is displayed in the upper-left breakout box; it represents nearly 12 million data points alone. The full graph represents about 1.2 billion measurements, and fetching that from the Circonus time series database took 48ms. When you start using the right tools, your eyes will open.