In today’s world, the performance of your IT systems has a direct impact on your brand reputation and overall business revenue. A “good enough” approach to software performance is no longer good enough. This has […]
This post includes contributions from Riley Berton, Principal SRE at Major League Baseball. You started the year with one Kubernetes cluster and now you have 100. How do you deal with that? This is a […]
This blog post is a summary of a presentation I delivered at SREcon Americas, which can be viewed here. A lot of companies bequeath the projects that they build internally unto the world as open […]
Complex architectures, pressures to deploy faster, and demand for optimal performance have placed greater strain on monitoring teams and as a result, an increasing number are looking to implement more advanced monitoring techniques. Part of […]
Whether you’re just beginning your monitoring journey or are a seasoned pro, being reminded of monitoring’s core principles is still helpful. From my own experience as a former SRE to what I’ve seen from our […]
In our era of rapid release cycles, engineers make frequent API updates in an effort to constantly improve user experiences. But while updates are designed with user benefits in mind, they can also have the […]
Latency measurements have become an important part of IT infrastructure and application monitoring. The latencies of a wide variety of events like requests, function calls, garbage collection, disk IO, system-call, CPU scheduling, etc. are of […]
As more companies transform into service-centric, “always on” environments, they are implementing Site Reliability Engineering (SRE) principles like Service Level Objectives (SLOs). SLOs are an agreement on an acceptable level of availability and performance and […]