A truism amongst operations professionals is that any alert your observability platform produces should be actionable, otherwise it is just noise. Auto-remediation is a hard problem, so the most common action triggered by an alert is for an engineer to gather more data and context. Implicit in this process is that the data an engineer needs in order to work the issue is already collected, and gaining insight into the issue is just a matter of looking at the right dashboards and writing the right queries. Typically, the only time the amount of data being sent changes is after a tier 3 escalation where the application owner determines the data they need is in fact not being collected and adjusts the log or tracing levels.
However, when determining how much data gets collected, the limiting factor is far more often the economic realities of how much an organization can afford to send to their observability tools, and has much less to do with the data not being available to collect in the first place. What if you could collect more data based on the health of your services? Circonus Passport provides tools for making alerts more actionable by allowing you to selectively collect and send more data when you need it most.
The goal of any ops professional is to get ahead of service impacting issues before they occur. The vast majority of alerts are leading indicators that something is trending towards a problem (increase in latency, an increase in queue depth, cpu spikes, etc…). These are high signal events that inform engineers where to look next. Rather than seeing the same set of sampled and filtered data for those impacted resources, what if engineers could start their investigation with all the data they could need to effectively respond to an incident? Passport provides the framework to achieve this goal.
What makes this approach particularly attractive is it does not require users to rework their existing collection strategies. If they feel their current collection strategies have struck a reasonable balance between cost and visibility, nothing needs to change. Instead, Passport can enhance existing strategies, and selectively collect more data based on key performance indicators from a given environment. Passport can be used to send more data, but the other side of the coin is that is can also be used to send less data. If an organization feels that their observability spend is too high, they could use Passport to “right size” their collection strategy with the confidence that Passport can help them collect the data they need only when they need it.
Integration with Third Party Alerting Tools
There is information about the state of your services in your observability platforms that can trigger Passport to tell your collection agents to send more data.
You simply define an alert condition in your observability platform of choice, and it sends that to a Webhook published by Passport. Passport doesn’t need to know a lot about the alert payload. It parses the incoming payload, and all it needs to know is the alert’s unique identifier, such as an alert ID or alert name, so users are not required to create a custom alert template just to work with Passport.
For example, let’s say you have an alert that triggers on high CPU utilization that was set up in Grafana. A user would just need to go to Passport and define this external alert and provide the alert ID field. The user would then make the published webhook for that external alert a recipient of the alert defined in Grafana. When the alert is triggered, the information in that alert can be used as part of the rule definition that’s used by the Passport Rules Engine for determining which collection strategy should be active on a given resource.
Let’s take a look at how it works.
Create an external alert.
Create a rule.
After creating the rule, ensure the rule weight is set to a higher value than all other existing rules that target the same host. This will ensure that the rule is applied to the agent when the alert is triggered.
Actionable Alerts, Faster MTTR
Gathering more context is often one of the first actions taken by engineers when an alert signals a problem, so why not automate this process? Passport removes the time-consuming tasks associated with adjusting collection configs and tooling to send back more data, allowing SREs and operations teams to focus on making sense of the data and more quickly solving the problem.
Sign up for Passport’s open beta to check it out for yourself.