What happens when cloud-based application infrastructure slows down? Twelve years ago, I attended a meetup at the San Francisco Perl Mongers group where an engineer from Amazon introduced the Elastic Compute Cloud service (EC2). At […]
Putting Linux kernel block I/O schedulers under the eBPF microscope eBPF tracing is a broad and deep subject, and can be a bit daunting at first sight. However, when Brendan Gregg issued the dictum “Perhaps […]
IoT-driven monitoring of Air Quality during the 2018 California Wildfires Over the past few weeks, the Camp Fire in Northern California and the Woolsey Fire in Southern California have devastated people and property. There has […]
Percentiles have become one of the primary service level indicators to represent real systems monitoring performance. When used correctly, they provide a robust metric that can be used for base-of-mission critical service level objectives. However, […]
A guide to the importance of, and techniques for, accurately quantifying your Service Level Objectives. This is the third in a multi-part series about Service Level Objectives. The first part can be found here and […]
Deriving meaningful insights from third-party logs has always been a difficult yet necessary task. Most analysis occurs after-the-fact, when something has gone wrong. Very few tools allow real-time monitoring of logs, so SREs have become […]
A simple primer on the complicated statistical analysis behind setting your Service Level Objectives. This is the second in a multi-part series about Service Level Objectives. The first part can be found here and the […]
In their excellent SLO-workshop at SRECon2018 (program) Liz Fong-Jones, Kristina Bennett and Stephen Thorne (Google) presented some best practice examples for Latency SLI/SLOs. At Circonus we care deeply about measuring latency and SRE techniques such as SLI/SLOs. […]