Accurate Systems Monitoring with Histograms

Histograms are the most important tool to accurately assess the behavior of your systems

Histograms provide dramatically improved insight into the underlying data.

Heat maps are a way of displaying histograms using color saturation instead of bar heights. This graph of API response times provides a more deep, rich understanding the data than a line graph for the same service over the same time period.

These visualizations help us ask more intelligent questions about how our systems behave and how our business reacts to that.

Since 2011, Circonus stored histogram data as a distinct datatype.

The rest of the industry is starting to catch on.

Frankly though, a lot the industry still doesn’t seem to get why.

More of today’s server apps are using the “histogram metric type,” which is great. At least it’s a step in the right direction. But look under the hood and all they’re really showing the operator are the min, max, median, and some other arbitrary percentiles.

The problem with an average is that it’s just a single statistical assessment of a large distribution of data, so it doesn’t provide much insight into the data set. Histograms do, but you can’t just analyze a handful of quantiles and call it a histogram. That’s no better insight than just looking at an average.

There is a better way, and we’re fighting an uphill battle to bring us there.

Use the complete distribution of your data

Why? Because averages don’t represent the things that are important to us

People are misled by averages. The real question is how you are treating your users. How many users are having a dissatisfying experience?

Are yesterday’s tools misleading you into celebrating success when you shouldn’t be?

