Enterprise Agents

If you’re like me, your first response to SaaS monitoring was: “You can’t see my machines/services/metrics from your cloud. That won’t be too useful.” With a little bit of thought, it’s pretty easy to arrive at the conclusion that you must run something on your infrastructure to bridge the divide. It was a fun and exciting project here to build that magic something called the Circonus Enterprise Agent.

circ-ea-login

The Circonus Enterprise Agent (we’ll call it our “EA” from here on) is all of our magic monitoring software bundled into a maintainable VMWare virtual appliance than can be run on your internal networks to track stuff that the public shouldn’t be seeing. We had some interesting choices to make during development and I thought I’d share what they were and why we made them.

Choosing a platform

Most of our internal infrastructure runs on some variant of OpenSolaris technology. We chose this for a variety of reasons. Most importantly, storing your precious data on ZFS seemed like the right thing to do. After that, the fault management architecture (FMA) available in OpenSolaris allows us to keep our machines and services running more reliably. Reliability and data permanence are the two most important factors in technology selection here at Circonus (a fact our customers respect).

So, with all this talk about OpenSolaris and its advantages you’d imagine we built our EA on the same technology, right? Not so simple. For a virtual appliance image that is easy to administer and easy to upgrade in the field you need a good package management system. OpenSolaris simply falls on its face there. Oracle’s promises of IPS (the new and coming package management system for Solaris 11) are quite compelling, but that is just a promise today. Instead, we turned to the tried and true CentOS Linux-based platform for our EA.

CentOS provides all the features we need to run our agent software, manage package upgrades and distribution seamlessly and simply, and the core operating system is both stable and secure. In an interesting later development, we provide Joyent customers the ability to run an EA on one of their Joyent SmartMachines. Joyent’s operating architecture is actually derived from OpenSolaris — so we ended up porting our EA back to our core platform as well.

Today, the EA is available in two forms: a CentOS 5 VMWare-based appliance and a Joyent SmartMachine.

From where do you manage the appliance

circ-ea-manage-thumb

While most appliances have a web console that allows a variety of management tasks, we made a simple choice to have the appliance administrable via the main circonus.com web application. This is where Circonus users interface with all their data and set up their monitors, sp it only made sense to also administer their EA from the same place.

After using the system for a while now, I can say that I’m really pleased with this decision. The cohesiveness of scheduling checks on either your private EA and/or the world-wide Circonus agents through the same check creation interface is a simple pleasure. One single world-wide view of all the agents on which you can schedule checks makes it simple to understand how the monitoring system works.

What to automate

Generally speaking, when you think appliance, you think self-maintaining. That’s not an unreasonable expectation. However, this directly conflicts with our experience in operations. In operations, automatic upgrades of software are strictly taboo. Typically, the operations crew wants to schedule precisely when an upgrade will occur, be present and have a bulletproof evaluation and rollback plan. When you start talking about critical infrastructure like monitoring, “typically” becomes “always.”

With this in mind, we made the upgrade process on the EA completely automated, but not automatic. One click and the appliance will self-upgrade. Currently, this is the only ongoing task that is done from the appliance itself (rather than the circonus.com portal), but we’re looking to make some nice enhancements there as well. Soon, you’ll be able to trigger remote EA upgrades directly from the web application.

What you get

With an EA you get to leverage the power of Circonus against all of your private data. Networks, systems, applications and business systems that are only accessible via internal infrastructure can be monitored via an Enterprise Agent. The data is fed back to the Circonus cloud in real-time. All of that data can be alerted on, and is available for correlation, trending and planning purposes through the excellent Circonus tools you already know and love.

Finding Needles in a Worksheet

Traditional graphing tools can help you plan for growth or even narrow down root causes after a failure. But they’ have a reputation for being difficult to setup, navigate or customize. It’s nice to be able to just point Cacti at some switches or routers and have it gracefully poll each device for SNMP data. Yet when you need a custom perspective of the data (or collections of data), it can be an arduous experience setting up templates and graphs.

When we started to engineer Reconnoiter into a SaaS offering, one of the major driving forces was a desire to not suck like the others. Like you, we don’t understand why it has to be so damn hard (or require a dedicated IT staff) to take a handful of data points and correlate them into graphs that make sense of the noise. I like to think we’ve been successful. Customers have been overwhelmingly positive about our efforts, calling it “a graph nerd’s paradise”. Even still, we eat our own dog food and are constantly revisiting the service to look for better ways to get our work done. This is why we’re working hard on upcoming features like Graph Overlays and Timeline Annotations. And it’s also why we made recent changes to the workflow for graphs and worksheets.

If you’re a Circonus user, you already know how easy it is to create and view graphs. Adding them to worksheets gives you a page full of data to compare and relate. Choose a zoom preset (2 days, 2 weeks, etc) or select a date range, and all of the thumbnails are instantly redrawn in unison. It might sound basic, but it can be very useful if you’re not sure what you’re looking for. Unexpected patterns jump out at you pretty quickly.


However, most of the time you want to work with a single graph. Clicking on a thumbnail previously loaded a graph in “lightbox” view, hiding all other graphs from sight and letting you focus on the work at hand. This worked well most of the time, but had one big drawback… you couldn’t (easily) bookmark it. So we’ve moved the default view into its own page, sans lightbox, that can be bookmarked and shared with others. Miss the lightbox view? No worries, we’ve kept that as the new preview mode. Try it out in a worksheet for “flickr-style” navigation.

Here’s a short video I threw together to demonstrate some of these changes. There was some audio lag introduced by the YouTube processing, but it should be easy enough to follow along. If you’d like to see more examples like this one, shoot us an email and we’ll try to keep them coming.