Graph Annotations and Events

This feature has been a long time in coming: the ability to annotate your graphs! With the new annotations timeline sitting over the graph, not only can you create custom events to mark points in time, but you can also view alerts and see how they fit (or don’t fit) your metric data.

Annotations Timeline


First, let’s go to a graph and take a look at the annotations timeline to see how it works. When you choose a graph and view it, you will immediately see the new Annotation controls to the left side of the date tools, and the timeline itself will render in between the date tools and the graph itself. The timeline defaults to collapsed mode and by default will only show alerts from metrics on the current graph, so you may have an empty timeline at first. If you take a look at the controls, however, you will see three items: the Annotation menu, the show/hide toggle button, and the expand/collapse toggle button. The show/hide button does just what it says: it shows or hides the timeline. The expand/collapse button toggles between the space-saving collapsed timeline view and the more informative expanded timeline view.

If you open the Annotation menu, you will see a list of all the items you can possibly show in your timeline (or hide from it). Any selections you make here (as well as your show/hide and expand/collapse state changes) will be saved as site-wide user preferences in your current browser. All the items are separated into three groups:

Event Categories

This is a list of all the Event categories under the current account (these are seen and managed in the Events section of the site?we’ll get to that new section in a minute). If you have uncategorized events (due to deleting a category that was still in use), they will appear grouped under the “–” pseudo-category label.

Alerts

By default, the only alerts that will be shown will be alerts of all severity (sev) levels triggered by metrics on the current graph. If you wish, you may also show all alerts, and both categories of alerts may be filtered by sev levels. To do so, click one of the alert labels to expand a sev filter row with more checkboxes.

Text Metrics

This third group is not shown by default, but is represented by the checkbox at the bottom labeled “Include text metrics.” If you check this box, the page will refresh, and any text metrics on the current graph will then be rendered as a part of the timeline (and will be excluded from the graph plot and legend).

Once you have some annotations rendering on the timeline, take a look at the timeline itself. Hovering over a point will show a detail tooltip with the annotation title, date, and description, and hovering over either a point or a line segment will highlight the corresponding date range on the graph itself.

Now for the question on everyone’s minds: “Can I create events here, or do I have to go to the Events section to do that?” The answer is, yes, you can create events straight from the view graph page! To do so, simply use your right mouse button to drag-select a time range on the graph itself. A dialog will then popup for you to input your info and create the event.

Events Section

Now let’s head over to the Events section where you can manage your events and event categories. Simply click on the new Events tab (below the Graphs tab) and you’re there! To create an event, click the standard “+” tab at the upper left of the page. This will give you the New Event dialog. Most of the dialog inputs are pretty straightforward, with the exception of the category dropdown. This is a new hybrid “editable” dropdown input.


You may select any of its options if you’d like, or you can add new ones. To add a new option, simply select the last option (it’s labeled “+ ADD Category”). Your cursor will immediately be placed in a standard text input where you can enter your new category. When you’re finished, hit enter to create the new option and have it selected as your category of choice.

After you have created your event, you may need to edit it later. To edit any of its details, simply click on the pertinent detail of the event (when changing the event category, you will see it also has the new hybrid “editable” dropdown input which works exactly like the one in the New Event dialog).

In addition to start and end points (which may be the same date if you don’t want more than a single point), you may also add midpoints to your event. Click the Show details button for an event (the arrow button at the right end of an event row), and you will see the Midpoints list taking up the right half of the event details panel. Simply click the Add Midpoint button to get the New Midpoint dialog where you enter a title, description and choose a date for your point.

The one last element of the Events section that’s good to know about is the Categories menu at the upper right of the page. This allows you to delete categories as well as filter the Events list to only show a single category of events at a time. To do this, just click the name of a category in the Categories menu.

Past Performance: does this look right to you?

If you are like me, you look at a lot of data. I look at data in spreadsheets, I look at data on P&L statements, I look at term sheets, I look at systems data — a lot of systems data. I find the best way to look at data is to visualize it because it is the fastest way to get data into the amazing pattern matcher that is the human brain.

The human brain is quite good at saying “this is abnormal” and can usually even articulate why. This curve has a periodicity, that one a monotonic behavior, another is simply always flat… then they “change.” When we say “this visualization looks wrong,” we are almost always onto something real in the numbers. I’ll give you a simple visual example:


While there is obviously something starting at 8pm, we are only left with another question: “is it out of the ordinary?” It doesn’t look like that today, and it doesn’t appear to resemble the day before. What about last week? Let’s start the graph one week earlier:


This tells us a lot. It looks like we have a very similar event last week at this time. With most analysis tools, you stop here (or you hover with you mouse and try to correlate start/end times and magnitude to better understand how these two events resemble each other).

With Circonus, we don’t leave it here. Instead, we provide tools to help compare time separated events using our data overlay feature. We can take our original two-day view and overlay the data from last week right on top of (or in this case underneath).


Just two clicks and we’ve got a one-week offset data overlay and the visualization lends a little insight into what is going on. We can see the start times are identical, but the event from this week ends about 30 minutes before the one from last week — largely the same though.


Again, we find that visuals help. Understanding how these graph differ even when they are right on top of each other can be a bit challenging. Never fear! We’ve added help in the legend.


The legend takes on some new features when data overlays are in use. You now get a very clear, side-by-side read-out of the data in the graph including percentage differences. Additionally, the arrows that say “you’re higher than you were last week” become more saturated (redder) as the different in the data increases and fade to light grey if the two values are more similar. This makes it simple to quickly understand how current performance really compares to past performance. So, the interesting part of this graph is actually the subsequent spike of inbound traffic this is up 95% over last week. That’s something to look into.

Capacity Planning Made Easy

Okay, so capacity planning will never be fool proof. You simply cannot predict the future. However, some of the time you have a darn good idea of what the future will hold. Since someone knows what is likely to happen, why is it so hard to plan marketing initiatives, funnels and IT provisioning?

The reason is that things aren’t always linearly correlated. What’s that mean? Linear correlation goes something like this: if A depends upon B and I want twice as much A, I’ll need twice as much B. While correlating non-linear systems can be tricky, a lot can be done with linear regressions. The problem with any regression is that you need to put real numbers in, get real numbers out and understand how good they are.

When we look at how something grows, one of the most common tools in the statistics arsenal is a least-squared linear regression. That is: given a set of datapoints, what line best fits them? So, let’s say we have a lot of datapoints (boy do we have a lot of datapoints!). Now what does a linear regression tell us?

Let’s assume we’re looking at some traffic data over the month of December.


In this graph, it can be very hard to answer questions about the nature of the data. Two common questions are:

  1. are we growing or shrinking and by how much?
  2. if we stay on the current growth path, where will we be some point in the future?

Enter the linear regression:



Answering the first question is pretty simple now. We can look at the value on the left side of the graph, and the right side of the graph and do the math. You can’t see it in the screenshot, but the left the values are 5.49M and 5.88M which is roughly a 6.6% growth over 4 weeks. Now, any statistician will scream bloody murder about confidences in the data and model and any engineer will simply ask: “does that make sense?” Maybe we’ll look over 8 weeks and twelve weeks also to make sure that we build our confidence (this can be easier, though far less scientific, than understanding R2 values – which are, of course, available as well). Honestly, I personally find that reconciling this with my expectations is one of the better methods of trusting the model.

Let’s assume that we we expected some increase in resource usage during this time frame and that 6% is reasonable. Now onto the next question: where will we be in the future. In Circonus, we just jump up and extend our view window out one year and we can see what our model looks like in the future:


Next December we’ll be using 10.91M (this just happens to be MBits/s of network bandwidth to serve origin dynamic content on one of the sites managed over at OmniTI). We’ll revisit this month by month to ensure that we are indeed heading where we expected. It allows engineers and marketers and executives alike put real numbers into (what we call) napkin math which adds peace, clarity and allows most people to do easier what-if pontification. I can tell you one thing… we sleep better at night knowing specific numbers about a probable future.

Finding Needles in a Worksheet

Traditional graphing tools can help you plan for growth or even narrow down root causes after a failure. But they’ have a reputation for being difficult to setup, navigate or customize. It’s nice to be able to just point Cacti at some switches or routers and have it gracefully poll each device for SNMP data. Yet when you need a custom perspective of the data (or collections of data), it can be an arduous experience setting up templates and graphs.

When we started to engineer Reconnoiter into a SaaS offering, one of the major driving forces was a desire to not suck like the others. Like you, we don’t understand why it has to be so damn hard (or require a dedicated IT staff) to take a handful of data points and correlate them into graphs that make sense of the noise. I like to think we’ve been successful. Customers have been overwhelmingly positive about our efforts, calling it “a graph nerd’s paradise”. Even still, we eat our own dog food and are constantly revisiting the service to look for better ways to get our work done. This is why we’re working hard on upcoming features like Graph Overlays and Timeline Annotations. And it’s also why we made recent changes to the workflow for graphs and worksheets.

If you’re a Circonus user, you already know how easy it is to create and view graphs. Adding them to worksheets gives you a page full of data to compare and relate. Choose a zoom preset (2 days, 2 weeks, etc) or select a date range, and all of the thumbnails are instantly redrawn in unison. It might sound basic, but it can be very useful if you’re not sure what you’re looking for. Unexpected patterns jump out at you pretty quickly.


However, most of the time you want to work with a single graph. Clicking on a thumbnail previously loaded a graph in “lightbox” view, hiding all other graphs from sight and letting you focus on the work at hand. This worked well most of the time, but had one big drawback… you couldn’t (easily) bookmark it. So we’ve moved the default view into its own page, sans lightbox, that can be bookmarked and shared with others. Miss the lightbox view? No worries, we’ve kept that as the new preview mode. Try it out in a worksheet for “flickr-style” navigation.

Here’s a short video I threw together to demonstrate some of these changes. There was some audio lag introduced by the YouTube processing, but it should be easy enough to follow along. If you’d like to see more examples like this one, shoot us an email and we’ll try to keep them coming.

Visualizing Regressions

We’ve heard a lot of talk about Continuous Deployment strategies over the last 12-18 months. Timothy Fitz was one of the earliest proponents, publishing stories of their success over at IMVU last year. One of the greatest benefits to continually pushing your changes to production is that it takes less time and effort to find bugs when something goes wrong, since you have fewer commits in-between to navigate. But even with this style of release management, it helps to know which versions of code are running live on your components at any point. What happens when your newest code is enough to alter the normal behavior of the system, but not so drastic as to trigger an alert?

One of the nicer trending features in Circonus (or its open-source relative, Reconnoiter) is the ability to correlate unrelated datasets. I can take any collection of metrics on my account and group them together on a single graph. But what if you could view isolated events on the same graph, as an orthogonal data point? Check out these two graphs displaying some recent activity on one of our fault detection systems. The vertical lines represent the point at which a text metric’s value changed. Circonus renders them this way so you can easily recognize that specific moment in time.

20101025_screen1-624x364

the first graph I’m hovering over a dip in performance caused by the most recent release to that comment (svn r6230). In the second graph we’re running a fix (svn r6232) for the regression introduced in the previous commit. Could I have done the same level of correlation manually? Of course, but it’s nice to be able to zoom out and study the long-term affects of our release strategy on our overall stability. This is an enormously helpful tool for investigating Root Cause Analysis on our live systems, especially if you perform releases many times in a week (like we do). If you’re one of many using automation and Configuration Management suites like Puppet, Chef and the Marionette Collective, no doubt you’ll find it even more useful.

If you’d like to start trending your own text metrics, check out the Resmon DTD. Circonus can pull in your custom metrics in this format. Although the version numbers I mentioned earlier look like integers (well, they are integers), I can explicitly cast them as a string metric using the Resmon DTD. Here is what that might look like:

<ResmonResults> 
  <ResmonResult module="Site::CircProd" service="vers"> 
    <last_runtime_seconds>0.000274</last_runtime_seconds> 
    <last_update>1288044642</last_update> 
    <metric name="ernie" type="s">6297</metric> 
  </ResmonResult> 
</ResmonResults>

As you might imagine, you can get pretty creative with the sort of data you can pull into Circonus. In our next post I plan to look at how you can combine WebHook Notifications (that Brian announced last week) with these text metrics to start trending your alert history. Stay tuned!