Monitoring API Latencies After New Releases: 4 Common Mistakes to Avoid

In our era of rapid release cycles, engineers make frequent API updates in an effort to constantly improve user experiences. But while updates are designed with user benefits in mind, they can also have the opposite effect — potentially creating new performance issues.

Ensuring that your APIs are meeting performance requirements and SLOs as you release new updates is dependent on correctly monitoring API latencies. In the most basic sense, an API is a programming interface that issues user requests against a remote service, and the API latency is the amount of time required to service each user request.

Monitoring API latencies is complex because thousands — or even millions — of users are making multiple different requests. Engineers must collect and analyze all of this data ongoing — and particularly after each new release — to ensure that performance has not been impacted by updates. Histograms are the best tool for monitoring API latencies and are commonly used by engineers. Unfortunately, they’re often used incorrectly — leading to incorrect analysis. In this article, I share four common mistakes engineers make when using histograms to monitor API latencies from release to release.

Mistake #1: Not asking the right questions

An engineer has several considerations when deciding how to track API latencies. How do you define what’s considered a different or similar request? Should different requests take different amounts of time? Was the request response successful, or was there an error?

You may have a million of these API requests that you have latency numbers for; and you also have other dimensions or subgroups around them. If you’re not a statistician — which most engineers are not — it’s challenging to know what questions to even ask to begin gleaning insights from all of this latency information you’re collecting.

Too often, when comparing API latencies to detect changes between releases, engineers may examine data like the average difference of speed in the 99th percentile. However, this is not providing the type of insight you need. In fact, averages many times result in incorrect conclusions, because a few outliers skew the result. When comparing histograms of API latency data from one release to another, more sophisticated questions to ask include:

Are the shapes different?
Are the concentrations of data different?
Do the concentrations of data move?
How many concentrations of data does the histogram have?
Where are the concentrations of data?

New concentrations of data signal that your new release may have created new workloads because users started making a different request. Or, you have an implementation change in your release that causes certain requests to be serviced in a different way or slower — and so you have a new “bump,” or concentration of data, in your histogram. Whether this workload change was caused by the end-user or caused by some lower level system, it’s a workload change you now need to account for. In the below histogram, the Y-axis represents the sample size and the X-axis represents the sample value (microseconds). The highest point of the “bump” represents the largest concentration of data.

Mistake #2: Counting errors in overall latency

New API releases often result in new errors; and errors often occur very fast. A common error we see is that engineers include errors in their latency statistics, and these errors can therefore result in an artificially small latency — leading you to believe you’re meeting service level requirements, when in fact you’re not. After all, your SLO is not to serve 99% of all requests under one second, but rather all successful requests in under one second.

To prevent this error, you should create several histograms — one that includes everything, one that stores only successful requests, and one that stores only failed requests (or errors). You may also want histograms that store successful requests for a specific endpoint. This way, you can ensure accuracy of your API latency analysis, then slice and dice your data as needed.

Mistake #3: Incorrectly setting bin boundaries

Histograms divide all sample data into a series of intervals called bins. Unfortunately, poor histogram binning is one of the most common mistakes in measuring API latencies and often causes grossly inaccurate analysis.

When analyzing latencies for new API releases, you want to be able to compare current data to historical latency data from previous releases. To do this, all of your histograms across your organization need common bin boundaries. This way, you can aggregate histograms together and easily identify changes.

However, it’s extremely common for organizations to use different bin boundaries across their histograms — and even change them between releases. Aggregating histograms with different binning degrades the quality of your histograms and therefore the analysis of your data.

At Circonus, we use log linear histograms, which often have logarithmically increasing bin sizes. However, we use this same binning boundary scheme across all of our histograms.

Mistake #4: Sampling rather than collecting all data

The hardest problem with measuring API latencies is that people don’t record the data essential for monitoring and analytics. There’s an ongoing argument in the monitoring industry about sampling — that there’s no need to collect all the data. While sampling is ok, it’s not perfect. Obviously if you don’t collect data, then it’s lost forever. Collecting all data is going to enable more accuracy; and with histograms, you can do 100% sampling with relatively no overhead, so there’s no reason not to do so.

For every API you serve, you should measure the latency of every single request on every exposed endpoint. The microservice you talked to? Measure the latency there. The network protocol over which you communicated? Measure the size of every single packet sent in each direction. That Cassandra cluster? Measure the client-facing latency, but also measure the I/O latency of every single disk operation on each spindle (or EBS volume, or ephemeral SSD) on each node. It sounds like a lot of data, sure. But we live in the future, and analytics systems are capable of handling a billion measurements per second these days — all the while remaining economical.

The above graph shows the full distribution of every I/O operation on one of our core database nodes at Circonus. The histogram in the breakout box shows three distinct modes: two tightly coupled in the left peak and one smaller mode further out in the latency spectrum. We can also see a radical divergence in behavior immediately following Feb 14th at 9am. As we’re looking at one week of data, each time slice vertically is 1h30m. The slice highlighted by the vertical grey hairline is displayed in the upper-left breakout box; it represents nearly 12 million data points alone. The full graph represents about 1.2 billion measurements and fetching that from the Circonus time series database took just 48ms.

Have Confidence in Your Analysis

API updates are designed to make your users happier, but they can often result in new performance issues. Monitoring API latencies after new releases is meant to address this issue, but all too often these mistakes result in inaccurate analysis. As a result, you may have a latency issue and be unaware of it, or think you have a latency issue when in fact you do not. By addressing the above four common issues, you can have more confidence in the accuracy of your analysis and the decisions you make based on it.