Customizable Alerts and Ruleset Groups

Today we are releasing two new features to make your on-call life easier.

Customizable Alerts

Our default format was created and modified over the years based on user feedback, but of course it was never going to make everyone happy. How do we solve this? By letting you create your own alerts! If you head to your contact groups page, you will notice a new checkbox to “enable custom alert formats”. Check that and some new fields will appear to let you modify both long and short format bodies, add a summary for each type and even the subject line of your emails.

Alerts are customized by the use of macros and conditionals. Macros take the form {alert_id} and are replaced with the appropriate values. Conditionals look like %(cleared != null) Cleared: {cleared}% and in this example, if the alert is clear you will get a line like, Cleared: Mon, 15 July 2013 11:49:58 on your alert, if it is not clear this line will not be added.

As a further example, the current alerts in the new customizable format would like this:

Subject:
[{account}] Severity {severity} {status} {check_name}

Body:
Account: {account}

{status} Severity {severity} %(new_severity != null) [new Severity {new_severity}]%
Check: {check_name}
Host: {host}
Metric: {metric_name} ({value})
Agent: {broker_name}
Occurred: {occurred}
%(cleared != null) Cleared: {cleared}%
{link}
{metric_link}
{metric_notes}

The ? help bubble beside the Alert Formats section header has a full list of all the macros available for use. Our default alert format will be changing slightly as well, we are going to be putting your account name in the subject line instead of Circonus and we will be adding the metric notes to the body.

Ruleset Groups

Another feature we get asked for is some way of only alerting when 3 out of 5 servers are down, or only when CPU spikes in conjunction with a rise in Memory. To make this a reality, we’ve added the concept of “rule groups”, located under Alerts -> Groups. These groups take rulesets you’ve already created, like cpu > 80 or http 500 response code, and let you combine as many as you would like to form a group. You then define a formula on which the group alerts, these can be a threshold, when X of Y rules are in alert, trigger a sev 1, or you can create an expression, when (A and B) or C alerts, trigger a sev 2.

Lets look at a complete example. I’ve created 3 checks, one for each webserver in my infrastructure, each check collects the http response code. On that response code metric I’ve added 2 rules, if the value is not 200, or it goes absent, send me an email.

single_rule

Since I have redundancy in my servers, I choose to only get an email when one goes down. This way I don’t get woken up, and I can just take care of the problem the next day.

group_rules

However, I know I always want a least 2 servers up and running. So now I will go to the groups page, and create a webserver group. I first add my rulesets via the “add rulesets+” button, selecting all 3 webserver code rules. Then I add a formula and decide that if 2 (or more) out of the 3 servers go bad, trigger a sev 1 alert. Then I add my page group to get these sev 1s. Now I’ll still get emails if the servers go down, but I’ll get woken up with a page when I hit my group threshold.