Shipping Software with an SRE Mindset

This blog post is a summary of a presentation I delivered at SREcon Americas, which can be viewed here. A lot of companies bequeath the projects that they build internally…

Read More

Applying Advanced vs. Basic Monitoring Techniques

Complex architectures, pressures to deploy faster, and demand for optimal performance have placed greater strain on monitoring teams and as a result, an increasing number are looking to implement more…

Read More

10 Principles of Effective Monitoring: A Quick Checklist of Fundamentals

Whether you’re just beginning your monitoring journey or are a seasoned pro, being reminded of monitoring’s core principles is still helpful. From my own experience as a former SRE to…

Read More

Monitoring API Latencies After New Releases: 4 Common Mistakes to Avoid

In our era of rapid release cycles, engineers make frequent API updates in an effort to constantly improve user experiences. But while updates are designed with user benefits in mind,…

Read More

Why Open Source Histograms Are The Future of Telemetry Monitoring

Latency measurements have become an important part of IT infrastructure and application monitoring. The latencies of a wide variety of events like requests, function calls, garbage collection, disk IO, system-call,…

Read More

How to Correctly Frame and Calculate Latency SLOs

As more companies transform into service-centric, “always on” environments, they are implementing Site Reliability Engineering (SRE) principles like Service Level Objectives (SLOs). SLOs are an agreement on an acceptable level…

Read More