Post-Mortem 2017.1.12.1

TL;DR: Some users received spurious false alerts for approximately 30 minutes, starting at 2017-01-12 22:10 UTC. It is our assessment that no expected alerts were missed. There was no data loss. Overview Due to a software bug in the ingestion pipeline specific to fault detection, a small subset (less than 2.5%) of checks were not […]

Introducing IRONdb

Software is eating the world. Devices that run that software are ubiquitous and multiplying rapidly. Without adequate monitoring on these services, operators are mostly flying blind, either relying on customers to report issues or manually jumping on boxes and spot checking. Centralized collection of telemetry data is becoming even more important than it ever was. […]

Systems Monitoring is Ripe for a Revolution

Before we explore systems, let’s talk users. After all, most of our businesses wouldn’t exist without lots of users; users that have decreasing brand loyalty and who value unintrusive, convenient, and quick experiences. We’ve intuited that if a user has a better experience on a competitor’s site, then they will stop being your customer and […]


A few months ago we announced the availability Circonus One Step Install (COSI) to introduce a very fast way to get data collected for systems with the most obvious set of metrics enabled. This makes monitoring new systems as easy as copying and pasting a command into a shell on the machine to be monitored, […]

No, We “Fixed the Glitch”

If you haven’t seen the movie Office Space, you should do so at your earliest convenience. As with the new TV comedy, “Silicon Valley,” Mike Judge hits far too close to home for the movie to be comfortable… its hilarity, on the other hand, is indisputable. So much of our lives are wrapped up in […]

The Circonus API and Raspberry PI

Building a Raspberry PI Circonus AlertBot To a Site Reliability Engineer, one of the most important things is making sure that you get alerts as soon as events happen. Circonus provides many contact options, such as SMS, email, Slack, PagerDuty, or VictorOps. For me, if I’m not actively watching for alerts,I rely on my phone […]

Crash-Dump Necromancy

Recently, one of our on-premises customers experienced a system crash with one of their metric-storage nodes. Circonus offers the option to operate a complete, private copy of our product on a customer’s own premises. This may be because of a compliance requirement or for reasons of scale that make the public SaaS version unsuitable. In […]

How We Keep Your Data Safe

Some of our users tell us that moving to Circonus from in-house operated tools was driven by the desire to outsource responsibility for availability and data safety to someone else. This might sound odd, but building highly resilient systems is hard, most engineering teams are focused on solving those hard problems for their applications and […]

Time, but Faster

Every once in awhile, you find yourself in a rabbit hole, unsure of where you are or what time it might be. In this post, I’ll show you a computing adventure about time through the looking glass. The first premise you must understand was summed up perfectly by the late Douglas Adams; “Time is an […]

Rapid Resolution With The Right Tools

Here at Circonus, we obviously monitor our infrastructure with the Circonus platform. However, there are many types of monitoring, and our product isn’t intended to cover all the bases. While Circonus leads to profound insights into the behavior of systems, it isn’t a deep-dive tool for when you already know a system is broken. Many […]

LuaJIT and the Illumos VM

I’m about to dive into some esoteric stuff, so be warned. We use LuaJIT extensively throughout most of our product suite here at Circonus. It is core supported technology in the libmtev platform on which we build most of our high-performance server apps. Lua (and LuaJIT) integrate fantastically with C code and it makes many […]

The Uphill Battle for Visibility

In 2011, Circonus released a feature that revolutionized the way people could observe systems. We throw around the word “histogram” a lot and the industry has listened. And while we’ve made great progress, it seems that somewhere along the way the industry got lost. I’ve said for some time that without a histogram, you just […]

Percentages Aren’t People

This is a story about an engineering group celebrating success when it shouldn’t be… and their organization buying into it. This is not the fault of the engineering groups, or the operations team, or any one person. This is the fault of yesterday’s tools not providing the right data. The right insights. The ability to […]

Understanding API Latencies

Today’s Internet is powered by APIs. Tomorrow’s will be even more so. Without a pretty UI or a captivating experience, you’re judged simply on performance and availability. As an API provider, it is more critical than ever to understand how your system is performing. With the emergence of micro services, we have an API layered […]

Pully McPushface

The Argument for Connectivity Agnosticism It’s about push vs. pull… but it shouldn’t be. There has been a lot of heated debate on whether pushing telemetry data from systems or pulling that data from systems is better. If you’re just hearing about this argument now, bless you. One would think that this debate is as […]

What’s new in JLog?

Introduction There is a class of problems in systems software that require guaranteed delivery of data from one stage of processing to the next stage of processing. In database systems, this usually involves a WAL file and a commit process that moves data from the WAL to the main storage files. If a crash or […]

Circonus One Step Install

Introducing Quick and Simple Onboarding with C:OSI When we started developing Circonus 6 years ago, we found many customers had very specific ideas about how they want their onboarding process to work. Since then, we’ve found that many more customers aren’t sure where to start. The most rudimentary task new and existing users face is […]

Circonus Instrumentation Packs

In our Circonus Labs public github repo, we have started a project called Circonus Instrumentation Packs, or CIP. This is a series of libraries to make it even easier to submit telemetry data from your application. Currently there are CIP directories for go, java,  and node.js. Each separate language directory has useful resources to help instrument applications […]

Discovering circonusvi

Folks who know Circonus and use it regularly for operations also know that its API is an important part of efficient management for your monitoring facility. For example, after building out a new datacenter with our SaaS application, we wanted to apply tagging to our checks to make searching more effective. The UI is easy […]

Graph Hover Lock

A new feature to help make sense of graphs with multiple data points When visualizing your data, you may often want to compare multiple data points on a single graph. You may even want to compare a metric across a dozen machines, but a graph with more then two or three data points can quickly […]

Advanced Search Builder

Last year, changes on the backend allowed Circonus to make significant improvements to our search capability. Now, we’ve added an Advanced Search tool to allow users to easily build complex search queries, making Search in Circonus more powerful and flexible than ever before. When you click on the search icon, you will see an “Advanced” […]

Show Me the Data

Avoid spike erosion with Percentile – and Histogram – Aggregation It has become common wisdom that the lossy process of averaging measurements leads to all kinds of problems when measuring performance of services (see Schlossnagle2015,  Ugurlu2013,  Schwarz2015,  Gregg2014). Yet, most people are not aware that averages are even more abundant than just in old-fashioned formulation of SLAs […]

The Future of Monitoring: Q&A with Jez Humble

Jez Humble is a lecturer at U.C. Berkeley, and co-author of the Jolt Award-winning Continuous Delivery : Reliable Software Releases through Build, Test and Deployment Automation (Addison-Wesley 2011) and Lean Enterprise : How High Performance Organizations Innovate at Scale (O’Reilly 2015), in Eric Ries’ Lean series. He has worked as a software developer, product manager, consultant […]

ACM – Testing a Distributed System

I want to sing the praises of one of our lead engineers, Phil Maddox, for authoring a very interesting paper, Testing a Distributed System, which was published in Communications of the ACM, Vol. 58 No. 9. A brief excerpt follows: “Distributed systems can be especially difficult to program for a variety of reasons. They can […]

The Future of Monitoring: Q&A with John Allspaw

John Allspaw is CTO at Etsy. John has worked in systems operations for over 14 years in biotech, government, and online media. He started out tuning parallel clusters running vehicle crash simulations for the U.S. government, and then moved on to the Internet in 1997. He built the backing infrastructures at Salon, InfoWorld, Friendster, and […]

Hallway Track: The Future of Monitoring

I’ve been in this “Internet industry” since around 1997. That doesn’t make me the first on the stage, but I’ve had a very wide set of experiences: from deep within the commercial software world to the front lines of open source and from the smallest startup sites to helping fifteen of the world’s most highly […]

Case for a Broader, Deeper, Coordinated Approach to Monitoring Your Business

Effective monitoring is essential for running a successful web business. It improves business performance by accelerating response to issues and opportunities that arise from web application operations, helping your company get and keep customers, boost revenue and build brand reputation. Monitoring the Big Picture: A Modern Approach for Web Application Monitoring View/Download PDF Regardless of […]

Distributed Systems Testing or Why My Hair Is Falling Out

Here at Circonus, our infrastructure is highly distributed. Many of the functions of Circonus are distinct systems that communicate with each other. In addition, we use a distributed data store for storing and retrieving data for graphs. Of course, testing a system is necessary to ensure that everything keeps working smoothly, but testing distributed systems […]

Metrics from Custom Apps…Easy!

Many developers are looking for a platform to which they can send arbitrary data from their custom applications to collect and visualize those metrics, as well as alert on specific thresholds. With Circonus’s ability to accept and parse raw JSON, it’s easy to send metrics from custom applications into the system. More information on JSON […]

Wrangling Elephants in the Cloud

Yonah Russ is a hands-on Technology Executive, System Architect, and Performance Engineer. He is founder of DonateMyFee. You can read more articles by Yonah on LinkedIn where you will also find the original version of this post. You know the elephant in the room, the one no one wants to talk about. Well it turns […]

Underneath Clean Data: Avoiding Rot

When many people talk about clean data, they are referring to data that was collected in a controlled and rigorous process where bad inputs are avoided. Dirty data has samples outside of the intended collection set or values for certain fields that may be mixed up (e.g. consider “First Name: Theo Schlossnagle” and “Last Name: […]

Our Monitoring Tools are Lying to Us

I posted the article below to LinkedIn a few weeks back. Since it was relatively popular and relevant to the Circonus community we decided to repost to our Blog. You can find the original here. I came across this vendor blog post today extolling the virtues of monitoring application performance using Percentiles versus Averages. Hard to […]

One-Minute-Resolution Data Storage is Here!

Have you noticed your graphs looking better? Zoom in. Notice more detail? That’s what our switch from 5 to 1-minute-resolution data storage has done for our SaaS customers. Couple that with our 7 years of data-retention and you’ve got an unrivaled monitoring and analytics tool. Dear customers, your computations are becoming more accurate as we speak! […]

Video: Math in Big Systems

Every year the esteemed Usenix organization holds their LISA conference. LISA has transformed slowly over the years as systems, architectures, and the nature of large-scale deployments have changed, but this year represented the largest change to date. “The format of the conference was substantially different and I believe it (changed) for the best. The topics, content, and […]

Alerting on disk space the right way.

Most people that alert on disk space use an arbitrary threshold, such as “notify me when my disk is 85% full.” Most people then get alerted, spend an hour trying to delete things, and update their rule to “notify me when my disk is 86% full.” Sounds dumb, right? I’ve done it and pretty much […]

A New Day for Navigation and Search

Today we finally rolled out the new navigation menu and search bar that we’ve been working on for a while. We had been getting feedback on the poor usability of the old horizontal menu system and knew that a “pain point” had been reached—it was time to revisit how we treated navigation and search in […]

Blueprints – Graphing made easy

Introducing Blueprints Today I’d like to introduce a new Circonus feature we’re calling Blueprints. Blueprints is a way to effortlessly create reusable graphs that can be used to visualize any host where the data you’re collecting is similar. In the modern age of Internet infrastructure our customers are often faced not with managing just one […]

AWS Cloudwatch Support

This month we pushed native Cloudwatch support – any metric that you have in Cloudwatch can now be added to graphs and dashboards, and alerts can be created for them. Auto Scaling (AWS/AutoScaling) AWS Billing (AWS/Billing) Amazon DynamoDB (AWS/DynamoDB) Amazon ElastiCache (AWS/ElastiCache) Amazon Elastic Block Store (AWS/EBS) Amazon Elastic Compute Cloud (AWS/EC2) Elastic Load Balancing […]

JSON Over HTTP – Data Collection Made Simple

At Circonus, one of our goals is to try to make it as easy as possible to monitor your data. One of the ways we do this is to allow data formatted in JSON to be pushed or pulled over HTTP into Circonus. Since HTTP is spoken everywhere, and JSON is understood everywhere, this allows […]

Exploring Keynote XML Data Pulse

I’ll be the first to admit that the Circonus service can be somewhat intimidating. Sometimes it is hard to puzzle out what we do and what we don’t do. Case in point: perspective-based transactional web monitoring. Many people have asked us, given our global infrastructure, why don’t we support complex web flows from all of […]

Monitoring Elasticsearch

With the much anticipated announcement of the Elasticsearch 1.0.0 release, we thought we’d mention that several of the features that you use within Circonus are powered by Elasticsearch behind the scenes. We could never, in good conscience, run a product or service that we couldn’t extensively monitor. So, when it comes to monitoring things we […]

Ways to Collect Systems Data in Circonus

When you decide to monitor your systems with Circonus, there’s quite a few options on how to collect your metrics. We believe Circonus should be a tool that does what you need, when you need it. Circonus does not force you into a specific approach or method. Since there are so many different ways to […]

Tags: A Long Time Coming

Ok, we know a lot of you have been asking for tags in Circonus for a long time. Well, they’re finally here! The tags feature is currently in beta, and will be released to all customers very soon. (Tags have actually been available to API users for a while, just without UI support in the […]

Customizable Alerts and Ruleset Groups

Today we are releasing two new features to make your on-call life easier. Customizable Alerts Our default format was created and modified over the years based on user feedback, but of course it was never going to make everyone happy. How do we solve this? By letting you create your own alerts! If you head […]

Understanding service latencies via TCP analysis

Latency is the root of all that is evil on the Internet… or so the saying goes. CPUs get faster, storage gets cheaper, IOPS are more plentiful, yet we feeble engineers have done close to nothing on improving the speed of light. While there is certainly a lower-bound to latencies rooted in the physical world, […]

Manage Metrics with circonus-cmi

As an environment grows, so does the headache of managing checks and metrics. Every new machine and cluster node requires a new set checks and metrics. There are the Circonus API and circonusvi tool, but those still require manual intervention. Wouldn’t it be nice if the same tools used for provisioning could handle Circonus setup […]

On the Move: Circonus Mobile

It’s been a long time coming, but it’s finally here: a Circonus that’s optimized for low-resolution devices like your smartphone or tablet! For the past few weeks, eagle-eyed Circonus customers may have noticed links to the mobile site creeping into various places, such as the login page ( and the application footer. Although we officially […]

What’s New Q1 2013 Edition

Navigation and URL Changes I’ll start this update talking about the most obvious changes to the UI, which is our new completely horizontal navigation and new URL structure. We had received a lot of feedback about the mix of horizontal and vertical navigation. The tabs we were told were hard to read, especially with the […]

Web Application Stat Collection with Node.js and Circonus

Collecting and carefully monitoring statistics is an essential practice for catching and debugging issues early, as well as minimizing degraded performance and downtime. When running a web service, two of the most fundamental and useful stats are request/response rates (throughput) and response times (latency). This includes both interactions between a user and the service as […]

Sometimes you just need a different hammer

Circonus has a lot of powerful tools inside, but as anyone who has worked with real data knows: if you can’t get your data out into the tool you need, you’re going to suffer. We do all sorts of advanced analysis on telemetry data that is sent our way, but the systems we use to […]

Fault Detection: New Features and Fixes

One of the trickier problems when detecting faults is detecting the absence of data. Did the check run and not produce data? Did we lose connection and miss the data? The latter problems are where we lost a bit of insight, which we sought to correct. The system is down A loss of connection to […]

Updates From The Tech Team

Now that it is fall and the conference season is just about over, I thought it would be a good time to give you an update on some items that didn’t make our change log (and some that did), what is coming shortly down the road and just generally what we have been up to. […]

Understanding Data with Histograms

For the last several years, I’ve been speaking about the lies that graphs tell us. We all spend time looking at data, commonly through line graphs, that actually show us averages. A great example of this is showing average response times for API requests. The above graph shows the average response time for calls made […]

Web Portal Outage

Last night became unavailable for 34 minutes, this was due to the primary database server becoming unavailable. Here is a breakdown of events, times are US/Eastern. 8:23 pm kernel panic on primary DB machine, system rebooted but did not start up properly 8:25 -> 8:27 first set of pages went out about DB being […]

Graph Annotations and Events

This feature has been a long time in coming: the ability to annotate your graphs! With the new annotations timeline sitting over the graph, not only can you create custom events to mark points in time, but you can also view alerts and see how they fit (or don’t fit) your metric data. Annotations Timeline […]

Insights from a Data Center Conference

At the beginning of this month, I’d attended the Gartner Data Center Conference in Las Vegas, and wanted to share with you some of my gained impressions and insights from the event. First, I have to say that I have seldom seen a group of more conscientious conference attendees (aside from Surge, of course, and […]

Monitoring your Vitals During the Critical Holiday Retail Season

As with Brick & Mortar stores, the Holiday season is a critical time for many E-Commerce sites. Like their off-line brethren, these sites also see large increases in both traffic and revenue, sometimes substantially so. Of course these changes in user behavior don’t just affect E-Commerce sites; consider a social-networking site like Foursquare, where a […]

Template Web UI

Back in October we released the first version of our new Templating API, allowing you to easily replicate sets of bundles across multiple hosts. Now we bring you the time-saving sweetness of Templates in the web interface as well; if you have multiple servers that you want to monitor in exactly the same way, Templates […]

Template API

Setting up a monitoring system can be a lot of work, especially if you are a large corporation with hundreds or thousands of hosts. Regardless of the size of your business, it still takes time to figure out what you want to monitor, how you are going to get at the data, and then to […]

One Dashboard to Rule Them All

Ever dream of having a systems monitoring dashboard that was actually useful? One where you could move things around, resize them, and even choose what information you wanted to display? Large enterprise software packages may have decent dashboards, but what if you’re not a large enterprise or you don’t want to pay an arm and […]

What’s in a number?

Numbers, numbers, numbers; we’re all about numbers here at Circonus. We have trillions of data points which we feed into a slew of algorithms and processes to help our users identify problems with their data. But what are these numbers? It turns out that isn’t an easy question to answer. Like most monitoring systems, Circonus […]

A Lotta Love for Keyboard Users

All web users who bemoan the general lack of support for keyboard accessibility in web apps, take heart! Circonus has some great features for keyboard lovers. We know there are many web users out there for whom keyboard shortcuts are a quicker and easier way to use applications, particularly web apps. This is especially true […]

Lost In Translation

For more than ten years, OmniTI has been making large-scale critical Internet infrastructure work. It is, obviously, not black magic or voodoo. Perhaps not so obviously, it is not technical competence that leads to success here. I like to think our team has technical competence in spades as we have an impeccable track record, authored […]

Capacity Planning Made Easy

Okay, so capacity planning will never be fool proof. You simply cannot predict the future. However, some of the time you have a darn good idea of what the future will hold. Since someone knows what is likely to happen, why is it so hard to plan marketing initiatives, funnels and IT provisioning? The reason […]

Enterprise Agents

If you’re like me, your first response to SaaS monitoring was: “You can’t see my machines/services/metrics from your cloud. That won’t be too useful.” With a little bit of thought, it’s pretty easy to arrive at the conclusion that you must run something on your infrastructure to bridge the divide. It was a fun and […]

Finding Needles in a Worksheet

Traditional graphing tools can help you plan for growth or even narrow down root causes after a failure. But they’ have a reputation for being difficult to setup, navigate or customize. It’s nice to be able to just point Cacti at some switches or routers and have it gracefully poll each device for SNMP data. […]

Access Tokens with the Circonus API

When we rolled out our initial API months ago, we took a first stab at getting the most useful features exposed to help customers get up to speed with the service. A handful of our users expressed displeasure with having to use their login credentials for basic access to the management API. Starting today, we’re […]

Annotating Alerts and Recoveries

In the last couple of posts, Brian introduced our new WebHook notifications feature and I demonstrated how Circonus can graph text metrics for Visualizing Regressions. Both of these features are interesting enough on their own, but let’s not stop there. Today I have an easy demonstration showing how you can re-import your alert information to […]

Visualizing Regressions

We’ve heard a lot of talk about Continuous Deployment strategies over the last 12-18 months. Timothy Fitz was one of the earliest proponents, publishing stories of their success over at IMVU last year. One of the greatest benefits to continually pushing your changes to production is that it takes less time and effort to find […]

WebHook Notifications

This week we added support for webhook notifications in Circonus. For those that are unsure what a webhook is, its simply an HTTP POST with all the information about an alert you would normally get via email, XMPP or AIM. Webhooks can be added to any contact group. Unlike other methods, you can’t add one […]

Good Times in Charm City

It’s been a while since I had time to enjoy the technical conference scene. Thanks to my involvement with Circonus, I have plenty of action scheduled between RailsConf, Velocity and the Surge Scalability Conference. We attended RailsConf in Baltimore a couple weeks ago and had a great time. Circonus had an exhibition booth and we […]

Circonus at Velocity 2010

Hot on the heels of our RailsConf ticket giveaway, we have another contest for a free pass to Velocity 2010! I’m really excited to attend this year’s Velocity. It’s the Web Performance event to attend, and a great place to see the sharpest whips in the industry. Like before, the rules of this giveaway are […]

Circonus at RailsConf 2010

We’re anxious to meet and greet everyone at RailsConf next month in Baltimore. This will be our first conference appearance since the production launch. Some of our customers, including 37signals, will be visiting Charm City for this big event. I’m excited to see so many talented Web developers and operations folk in one conference. Having […]

Your Visitors Don’t Matter

Consider me old-fashioned, but I remember a time when an alert notification meant something. Drives failed, servers ran short on memory, or a cage monkey pulled the wrong cable at 3 A.M. Regardless of the circumstance, it demanded attention. Those were the days. Today, operations is all about doing more with less. No more dedicated […]

Disrupting the Status Quo

As a hobbyist programmer and full-time operations geek, I’ve been involved in my share of odd software projects. More often than not I’ve had to explain the purpose of the thing, answering numerous questions about the why, what or whowuzzit. I can say without any reservation that Circonus is that rare venture that breaks through […]

Introducing Circonus

Great ideas always begin with a catalyst. They can ignite in a flash of brilliance, or grow slowly like an ember hidden in the ashes of failure. Inspiration comes from different places, and is only ever cultivated into success with the right combination of talent, timing and fortitude.

And sometimes it just happens because you get fed up with inferior products.