The Problem with Percentiles – Aggregation brings Aggravation

Percentiles have become one of the primary service level indicators to represent real systems monitoring performance. When used correctly, they provide a robust metric that can be used for base-of-mission…

A Guide to Service Level Objectives, Part 3: Quantifying Your SLOs

A guide to the importance of, and techniques for, accurately quantifying your Service Level Objectives. This is the third in a multi-part series about Service Level Objectives. The first part…

Quantifying WordPress Performance Improvements with circonus-logwatch

Deriving meaningful insights from third-party logs has always been a difficult yet necessary task. Most analysis occurs after-the-fact, when something has gone wrong. Very few tools allow real-time monitoring of…

A Guide To Service Level Objectives, Part 2: It All Adds Up

A simple primer on the complicated statistical analysis behind setting your Service Level Objectives. This is the second in a multi-part series about Service Level Objectives. The first part can…

Latency SLOs Done Right

In their excellent SLO-workshop at SRECon2018 (program)¬†Liz Fong-Jones, Kristina Bennett and Stephen Thorne¬†(Google) presented some best practice examples for Latency SLI/SLOs. At Circonus we care deeply about measuring latency and…

TSDBs at Scale – Part Two

This is the second half of a two-part series focusing on the challenges of Time Series Databases (TSDBs) at scale. This half focuses on the challenges of balancing read vs.…