Access Tokens with the Circonus API

When we rolled out our initial API months ago, we took a first stab at getting the most useful features exposed to help customers get up to speed with the service. A handful of our users expressed displeasure with having to use their login credentials for basic access to the management API. Starting today, we’re pleased to announce support for access tokens within the Circonus API.

Tokens offer fine-grained access for each user to a specific service account, at your permission role or lower. For example, if Bob is a normal user on the Acme Inc. account, he can create tokens allowing normal or read-only access. Multiple applications can use the same token, but each application has to be approved by Bob in the token management page, diabolically named My Tokens. To get started, browse over to this page inside your user profile, select your account from the drop-down and click the “plus tab” to create your first token.


The first time you try to connect with a new application using your token, the API service will hand back a HTTP/1.1 401 Authorization Required. When you visit the My Tokens page again you’ll see a button to approve the new application-token request. Once this has been approved you’ll be able to connect to the API with your new application-token.


Using the token is even easier. Just pass the token as X-Circonus-Auth-Token and your application name as X-Circonus-App-Name in your request headers. Here’s a basic example using curl from the command-line:

$ curl -H "X-Circonus-Auth-Token: ec45e8a2-d6d9-624c-c21c-a83f573731c1" 
       -H "X-Circonus-App-Name: testapp" 
           https://api.circonus.com/api/json/list_accounts

[{
   "account":"social_networks",
   "account_description":"Monitoring for The Social Network.",
   "account_name":"Social Networks"
   "circonus_metric_limit":500,
   "circonus_metrics_used":124,
}]

One of the more convenient features with our tokens is how well they integrate with user roles. A token will never have higher access permissions than its owner. In fact, if you lower a user’s role on your account, their tokens automatically reflect this as well. Changing a “normal” user to “read-only” will render their tokens the same access level. But if you restore their original role, the token will also have its original privileges restored. Secure and convenient.

If you have any questions about our new API tokens or would like to see more examples with the Circonus API, drop us a line at hello@circonus.com.

Annotating Alerts and Recoveries

In the last couple of posts, Brian introduced our new WebHook notifications feature and I demonstrated how Circonus can graph text metrics for Visualizing Regressions. Both of these features are interesting enough on their own, but let’s not stop there. Today I have an easy demonstration showing how you can re-import your alert information to your trends. The end goal is an annotation on our graph that can be used to help identify, at a glance, which alert(s) correspond with anomalies on your graphs.

First, let’s set a WebHook Notification in our Circonus account profile. Choose the contact group that it should belong to, or create a new contact group specifically for this exercise. Type the URL where you want to POST your alert details in the custom contact field and hit enter to save the new contact.


Now we need something for our webhook to act as a recipient. For this example I have a simple Perl CGI script that listens for the POST notification, parses the contents, and writes out Circonus-compatible XML. It doesn’t matter which language you use, as long as you can extract the necessary information and write it back out in the correct XML format (Resmon DTD).

#!/usr/bin/perl
#
# alert.cgi

use strict;

my $cgi = CGI->new;
my $template = HTML::Template->new(
  filename => 'resmon.tmpl',
  die_on_bad_params => 0
);

# check for existence of alerts from webhook POST
if ($cgi->param('alert_id')) {

  # open XML output for writing
  open (OUT, ">/path/to/alert.xml") || 
    die "unable to write to file: $!";

  # loop through alerts
  for my $alert_id ($cgi->param('alert_id')) {

    # check for valid alert id format
    if ($alert_id =~ /^d+$/) {

      # craft our XML content
      $template->param(
        last_update => time,
         alert_id => $alert_id,
         account_name => $cgi->param('account_name'),
         check_name => $cgi->param("check_name_${alert_id}"),
         metric_name => $cgi->param("metric_name_${alert_id}"),
         agent => $cgi->param("agent_${alert_id}"),
         severity => $cgi->param("severity_${alert_id}"),
         alert_url => $cgi->param("alert_url_${alert_id}"),
      );

      # only print RECOVERY if available
      if ($cgi->param("clear_time_${alert_id}")) {
        $template->param(
          clear_time => $cgi->param("clear_time_${alert_id}"),
          clear_value => $cgi->param("clear_value_${alert_id}"),
        );

      # otherwise print ALERT details
      } else {
        $template->param(
          alert_time => $cgi->param("alert_time_${alert_id}"),
          alert_value => $cgi->param("alert_value_${alert_id}"),
        );
      }
    }
  }

  print OUT $template->output;
}

close (OUT);

Here is the template file used for the XML output.

<!-- resmon.tmpl -->
<ResmonResults>
  <ResmonResult module="ALERT" service="aarp_web">
    <last_runtime_seconds>0.000238</last_runtime_seconds>
    <last_update><TMPL_VAR name="last_update"></last_update>
    <metric name="account_name" type="s">
      <TMPL_VAR name="account_name">
    </metric>
    <metric name="alert_id" type="s">
      <TMPL_VAR name="alert_id">
    </metric>
  <TMPL_IF name="alert_value">
    <metric name="message" type="s">
      <TMPL_VAR name="check_name">`<TMPL_VAR name="metric_name"> 
      alerted <TMPL_VAR name="alert_value"> from <TMPL_VAR name="agent">
      at <TMPL_VAR name="alert_time"> (sev <TMPL_VAR name="severity">)
    </metric>
  </TMPL_IF>
  <TMPL_IF name="clear_value">
    <metric name="message" type="s">
      <TMPL_VAR name="check_name">`<TMPL_VAR name="metric_name"> 
      cleared <TMPL_VAR name="clear_value"> from <TMPL_VAR name="agent"> 
      at <TMPL_VAR name="clear_time"> (sev <TMPL_VAR name="severity">)
    </metric>
  </TMPL_IF>
    <metric name="alert_url" type="s">
      <TMPL_VAR name="alert_url">
    </metric>
  </ResmonResult>
</ResmonResults>

When everything is running live, the alert.cgi script will accept webhook POST notifications from Circonus and write the alert details out to /path/to/alert.xml. This file should be available over HTTP so that we can import it back into Circonus using the Resmon check. Once you’ve begun capturing this data you can add it to any graph, just like any other metric.

This might take you 30 minutes to setup the first time. But once you have it, this data can be really useful for troubleshooting or Root Cause Analysis. We plan to add native support for alert annotations within Circonus over the next few months, but this is a handy workaround to have until then.

Visualizing Regressions

We’ve heard a lot of talk about Continuous Deployment strategies over the last 12-18 months. Timothy Fitz was one of the earliest proponents, publishing stories of their success over at IMVU last year. One of the greatest benefits to continually pushing your changes to production is that it takes less time and effort to find bugs when something goes wrong, since you have fewer commits in-between to navigate. But even with this style of release management, it helps to know which versions of code are running live on your components at any point. What happens when your newest code is enough to alter the normal behavior of the system, but not so drastic as to trigger an alert?

One of the nicer trending features in Circonus (or its open-source relative, Reconnoiter) is the ability to correlate unrelated datasets. I can take any collection of metrics on my account and group them together on a single graph. But what if you could view isolated events on the same graph, as an orthogonal data point? Check out these two graphs displaying some recent activity on one of our fault detection systems. The vertical lines represent the point at which a text metric’s value changed. Circonus renders them this way so you can easily recognize that specific moment in time.

20101025_screen1-624x364

the first graph I’m hovering over a dip in performance caused by the most recent release to that comment (svn r6230). In the second graph we’re running a fix (svn r6232) for the regression introduced in the previous commit. Could I have done the same level of correlation manually? Of course, but it’s nice to be able to zoom out and study the long-term affects of our release strategy on our overall stability. This is an enormously helpful tool for investigating Root Cause Analysis on our live systems, especially if you perform releases many times in a week (like we do). If you’re one of many using automation and Configuration Management suites like Puppet, Chef and the Marionette Collective, no doubt you’ll find it even more useful.

If you’d like to start trending your own text metrics, check out the Resmon DTD. Circonus can pull in your custom metrics in this format. Although the version numbers I mentioned earlier look like integers (well, they are integers), I can explicitly cast them as a string metric using the Resmon DTD. Here is what that might look like:

<ResmonResults> 
  <ResmonResult module="Site::CircProd" service="vers"> 
    <last_runtime_seconds>0.000274</last_runtime_seconds> 
    <last_update>1288044642</last_update> 
    <metric name="ernie" type="s">6297</metric> 
  </ResmonResult> 
</ResmonResults>

As you might imagine, you can get pretty creative with the sort of data you can pull into Circonus. In our next post I plan to look at how you can combine WebHook Notifications (that Brian announced last week) with these text metrics to start trending your alert history. Stay tuned!

Good Times in Charm City

It’s been a while since I had time to enjoy the technical conference scene. Thanks to my involvement with Circonus, I have plenty of action scheduled between RailsConf, Velocity and the Surge Scalability Conference. We attended RailsConf in Baltimore a couple weeks ago and had a great time. Circonus had an exhibition booth and we gave out tons of demonstrations, free swag and t-shirts. But the best part of any con is catching up with old friends and making new ones.

I finally met Mark Imbriaco of 37signals in person. Mark has been a valued user for us, giving plenty of awesome feedback during the beta and after our production launch. If you haven’t seen it already, check out Mark’s interview on webpulp.tv. He offers a lot of insight into 37signals’ operations and architecture. Good stuff.

Last but not least, a nice relationship blossomed out of our participation at RailsConf. I’ve been aware of the RPM service over at NewRelic for a while now. Although they sometimes market it as monitoring software for Rails, a more apt description would be to call it a kickass profiling tool for Ruby and Java applications. It’s very useful for tracking down performance issues within your application code. But what happens when the problem isn’t in your source code… or maybe you’re just not sure? Fortunately for NewRelic RPM users, the solution just became very clear.

We recently rolled out support for importing your NewRelic RPM metrics directly into Circonus! All of the application statistics available over NewRelic’s data API are now easily accessible inside your Circonus account. Correlate your application CPU and Response Time with HTTP first-byte and total duration. See the impact of optimizations on your end user experience! All you need to create your first RPM check in Circonus is the account id and license key (available in your application’s newrelic.yml).

Ironically, the one question we kept hearing over and over again at RailsConf was:

How is Circonus different from NewRelic?

The answer is simple; we’re perfect complements. Circonus offers a holistic view of your networks, architectures, systems and services. NewRelic RPM provides a detailed view of your application internals. Both support real-time analysis of their individual focus areas. It’s really the perfect monitoring combination for any serious Rails shop.

If you’d like more information on Circonus or how it can support your architecture, shoot us a line or stop by Booth 103 at Velocity this week. We still have plenty of black t-shirts to give away. 🙂

Circonus at Velocity 2010

Hot on the heels of our RailsConf ticket giveaway, we have another contest for a free pass to Velocity 2010! I’m really excited to attend this year’s Velocity. It’s the Web Performance event to attend, and a great place to see the sharpest whips in the industry.

Like before, the rules of this giveaway are simple. Just tweet a message about Circonus being at Velocity and ask your friends to retweet it. The original "twitterer" with the most retweets by Friday, June 14 at noon (12pm EDT) wins. Here’s an example:

The @Circonus stuff is hot and it looks like they’ll be at #velocityconf this year:

That’s an easy way to earn a free 2010 Velocity sessions pass ($1295 value). Free free to get creative with your tweet message. Our only requirements are that it’s a positive message that mentions @Circonus and #velocityconf, and that it includes the link.

Yay, free stuff!

Circonus at RailsConf 2010

We’re anxious to meet and greet everyone at RailsConf next month in Baltimore. This will be our first conference appearance since the production launch. Some of our customers, including 37signals, will be visiting Charm City for this big event. I’m excited to see so many talented Web developers and operations folk in one conference. Having it in our hometown is icing on the cake.

As if that wasn’t enough, we have a couple of fun things to announce. First, Circonus will be giving away a free RailsConf sessions pass! All you have to do is tweet a message about Circonus at RailsConf to your friends and ask them to retweet it for you. The individual with the most retweets by noon (12pm EDT) on Monday, May 31, 2010 wins. Here’s an example tweet:

The @Circonus stuff is hot and it looks like they’ll be at #railsconf this year:

If you’re keeping score at home, that’s a free 2010 RailsConf sessions pass ($795 value) for the price of a few clicks. Feel free to get creative with your tweet message. Our only requirements are that it’s a positive message that mentions @Circonus and #railsconf, and that includes the link.

Why are you still reading this? Go off and start tweeting for your free RailsConf pass (Conference Sessions Only).

See you in Baltimore!

Your Visitors Don’t Matter

Consider me old-fashioned, but I remember a time when an alert notification meant something. Drives failed, servers ran short on memory, or a cage monkey pulled the wrong cable at 3 A.M. Regardless of the circumstance, it demanded attention. Those were the days.

Today, operations is all about doing more with less. No more dedicated hardware or late-night maintenance windows. Everything is virtual, cloud-based, or filling up squares in the grid. Automation reigns supreme, limitless scalability at our disposal. Abstraction at its finest.

But woe unto you, the flapping anomaly.

That visitor who tried to load your website was turned away, timed out and left to wither. Poor Jane wanted to view your site. She needed to view your site. She’d already submitted her order, only to be ignored. Forgotten. Disconnected with nary a trace to route nor a cookie to favor.

Jane was a victim of a numbers game. Someone, somewhere, decided that some problems don’t matter. Which ones? Who cares? They don’t matter. And because she happened to visit when this problem reared its head, you ignored her request. Who would ever make such a silly presumption that one failure is less important than another? What criteria is used to determine the worthiness of this alert or that one? Pure random circumstance, it would appear.

Many “uptime” services and monitoring suites promote the concept of selective or flapping failures. Vendors sell these features as a convenience, ostensibly as a sleep aide. The administrator’s snooze-bar. I can’t think of any other reason that ignoring a faulty condition would be considered a good thing. Perhaps they reason that only the check is affected. If it responds after the third attempt, it was probably ok for visitors all along. Right?

It’s disappointing how many vendors embrace this broken methodology. It probably seemed innocent at a glance. But the damage has been done; recklessness has taken root. We’ve been conditioned to accept these transient malfunctions as mere operational speed bumps. Rather than address the problem, we nudge the threshold a tad higher. Throw additional nodes into the cluster. Increase capacity, while decreasing exposure.

But there is a more responsible alternative. What ever happened to purposeful, iterative corrections and Root Cause Analysis? Notifications may be annoying at times, but they serve a crucial function in a healthy production architecture. Ignored alerts lead to stagnant bugs, lost traffic and missed opportunities. Stop treating your visitors like they don’t matter. There’s no such thing as a flapping customer.

Disrupting the Status Quo

As a hobbyist programmer and full-time operations geek, I’ve been involved in my share of odd software projects. More often than not I’ve had to explain the purpose of the thing, answering numerous questions about the why, what or whowuzzit. I can say without any reservation that Circonus is that rare venture that breaks through the trappings of application design and me-too engineering principles to become something truly revolutionary. To use the product is to highlight Circonus’ strengths. User reactions tell the story.

Bryan Allen, chief server wrangler over at Pobox, has been one of our earliest and most active Beta participants. These folks have been doing email services for longer than I’ve been using it. In a field this competitive, there is zero room for slack, and they know it. Bryan is a very sharp guy, so we were very pleased to read his thoughts on Circonus.

Monitoring, trending and fault analysis are tedious. So much so, most shops get them wrong, or don’t bother at all. Circonus is already poised to be a disruptive player; making the tedious easy, fast and accurate.

I was grateful to meet Bryan in person during my visit to Philly for PostgreSQL Conference, U.S. 2010. I’ve learned that Pobox and OmniTI share a number of common technical interests and philosophies, so it should come as no large surprise that they’d see some value in our efforts.

On the other end of the spectrum, you have the team at 37signals. They are an established leader in web design and SaaS solutions. Their specific forte is with simple (yet powerful) productivity services like Basecamp, Backpack, Campfire and Highrise. Heck, they created Ruby on Rails. If anyone knows good web applications, you better believe they do. We were fortunate to have Mark Imbriaco, Operations Manager for 37signals, run Circonus through the paces during our Beta program.

Circonus’ trending functions are incredibly powerful. The ability to consolidate metrics across a variety of services into a single graph makes it much easier to spot bottlenecks in one area that may correlate to performance problems in another. It’s a graph nerd’s paradise!

I’ll have to take Mark’s word on the last part. Many geeks’ idea of paradise lies somewhere on a beach with a frosty beverage and a strong wireless signal. But if you’re like Mark, and you need something to monitor your systems, you probably owe it to yourself to add Circonus to your shopping list.

There’s one word that I’ve heard repeated a few times from users, that Circonus is disruptive. Occasionally you’ll hear the word banted about to describe a new social media outlet or computing device. It’s usually associated with a revolutionary technology. There’s nothing new about monitoring, trending or fault detection. But there is something refreshingly insightful about the synergy of monitoring services on a single unified metric collection.

Enjoy the Revolution.

Introducing Circonus

Great ideas always begin with a catalyst. They can ignite in a flash of brilliance, or grow slowly like an ember hidden in the ashes of failure. Inspiration comes from different places, and is only ever cultivated into success with the right combination of talent, timing and fortitude.

And sometimes it just happens because you get fed up with inferior products.

The beginnings of Circonus land somewhere in-between. Created by the engineers at OmniTI, we’ve been dealing with the pains of performance monitoring and trending in highly scalable environments for years. We’ve tried various combinations of Open Source and COTS software packages, all of which left us with a sour taste and wanting for more.

Over the last couple of years, our team of highly skilled engineers, led by OmniTI’s own Theo Schlossnagle, have been crafting and refining a truly convergent monitoring platform. Circonus started off as the Reconnoiter project, attempting to address the disconnect between existing monitoring and trending solutions.

Circonus is currently in a closed beta, receiving valuable feedback from customers and partners. We expect to launch publicly in April 2010. In the meantime, we’ll use this blog as an outlet to discuss the upcoming release and divulge all the cool stuff in the pipeline. I hope you visit here often to find out what we’re working on.