Sometimes you just need a different hammer

Circonus has a lot of powerful tools inside, but as anyone who has worked with real data knows: if you can’t get your data out into the tool you need, you’re going to suffer. We do all sorts of advanced analysis on telemetry data that is sent our way, but the systems we use to do that are somewhat isolated from the end user. While simple composites and math including population statistics are available in “point-and-click” fashion, more complicated, ad-hoc analysis on the “back-end” is only possible by Circonus engineers.

The “back-end”/”front-end” delineation is quite important here. In an API driven service like Circonus, the front-end is you or, more specifically, any thing you want to write against our APIs. Often times, when using services there are two types of API consumers: third-party tools and internal tools. Internal tools side are the Python or Ruby or Java programs you write to turn knobs, register checks, configure alerts and otherwise poke and prod at your Circonus account. The third-party tools are simply the tools that someone else authored and released for your use.

On to data languages.

So, where is this going? The traditional scripting languages aren’t really the limit of API consumers when it comes to big data services like Circonus. If you think about the fact that you’ve been dumping millions or billions of telemetry points at Circonus, some of our backend systems look a lot more like a data storage service than a telemetry collection orchestration service. More languages enter the fray when the APIs return data… such as R.

Working with Circonus data from within R is easy. After installing the circonus R package, you simply create a token to use the API with your Circonus account and then use that UUID to establish a connection within R:

api <- circonus.default('06178a18-0dff-0b0e-0794-0cc3fd350f04')

It's worth noting that the first time you run this command after creating your token, you'll get an error message saying that you need to authorize it for use with this application. Just go back to the API tokens page and you'll see a button to allow access to R. Once authorized, it will continue to work going forward.

Getting at data.

In the following Circonus graph, I'm looking at inbound and outbound network traffic from a node in Germany over a two month period starting on 2012/09/01.

If I click on the the "view check" link in the legend, it takes me to check 111. Using that little bit of information, I can pull some of that information right into R. Assuming I want to pull the outbound network traffic, Circonus tells me the metric name is "outoctets."

> checkid <- 111
> metric <- "outoctets"
> period <- 300
> start_time <- '2012-09-01 00:00:00'
> end_time <- '2012-11-01 00:00:00'
> data <- circonus.fetch_numeric(api,checkid,metric,start_time,end_time,period)

The above R session is fairly self-explanatory. I'm pulling the "outoctets" metric form check 111 over the two month time period starting on September 1st, 2012 at a granularity of 300 seconds (5 minutes).

Working with data.

This will give me several columns of numerical aggregates that I can explore. Specifically, I get a column that describes the time intervals, and columns corresponding to the number of sample, the average value, the standard deviation of those samples, as well that like information related to the first order derivatives over that time series data. All of this should look familiar if you have ever created a graph in Circonus as the same information is available for use there.

> names(data)
[1] "whence"            "count"             "value"            
[4] "derivative"        "counter"           "stddev"           
[7] "derivative_stddev" "counter_stddev"

September and October combined have 61 days. 61 days of 5 minute intervals should result in 17568 data points in each column.

> length(data$value)
[1] 17568

Our values of outoctets are (you guessed it) in octets and I'd like those in bits, so I need to multiply all the values by 8 (in the networking world, octets and bytes are different names for the same thing and there are 8 bits in a byte). That $value column are byte counts and we want the first order derivative to see bandwidth which is the $counter. Let's now ask a question that would be somewhat tricky via the point-and-click interface of Circonus: "Over the 5 minute samples in question, what was the minimum bandwidth, maximum bandwidth and, while we're at it, the 50th (median), 95th, 99th and 99.9th percentiles?"

> quantile(data$counter * 8, probs=c(0,0.5,0.95,0.99,0.999,1))
        0%        50%        95%        99%      99.9%       100% 
  221682.7  4069960.6  9071063.1 10563452.4 14485084.9 17172582.0

For those that don't use R, everything statistical is just about this easy... After all, it is a language designed for statisticians crunching data. We can also quickly visualize the same graph in R.

We can see it has the same contour and values as our Circonus graph albeit far less beautiful!

I often pull data into R to perform discrete Fourier transforms (fft) to extract the frequency domain. It can help me programmatically determine if the graph has hourly/daily/weekly trends. That, however, it a bit too complicated to dive into here.

As a Circonus user, plug in some R! Please drop us a note to let us know what awesome things you are doing with your data and we will work to find a way to cross pollenate in our Circonus ecosystem.