The Four Golden Signals, developed by Google SREs, are key metrics used to monitor the health of your systems. In today’s complex IT environments, these key metrics can help engineers and IT operations prioritize the most significant issues to address. The Four Golden Signals include:
- Latency: the time it takes to serve a request
- Traffic: the total number of requests across the network
- Errors: the number of requests that fail
- Saturation: the load on your network and servers
In the following 9-minute video, I focus on two of these signals in particular, latency and errors, because they often result in customer-facing symptoms. Specifically, I share how alerting on latency and errors in combination with certain de-bugging metrics can result in a low false positive rate while giving engineers the insights they need to identify root cause.
Check out the video for quick tips that will help you triage, understand, and resolve incidents at your company faster.