Circonus at RailsConf 2010

We’re anxious to meet and greet everyone at RailsConf next month in Baltimore. This will be our first conference appearance since the production launch. Some of our customers, including 37signals, will be visiting Charm City for this big event. I’m excited to see so many talented Web developers and operations folk in one conference. Having it in our hometown is icing on the cake.

As if that wasn’t enough, we have a couple of fun things to announce. First, Circonus will be giving away a free RailsConf sessions pass! All you have to do is tweet a message about Circonus at RailsConf to your friends and ask them to retweet it for you. The individual with the most retweets by noon (12pm EDT) on Monday, May 31, 2010 wins. Here’s an example tweet:

The @Circonus stuff is hot and it looks like they’ll be at #railsconf this year:

If you’re keeping score at home, that’s a free 2010 RailsConf sessions pass ($795 value) for the price of a few clicks. Feel free to get creative with your tweet message. Our only requirements are that it’s a positive message that mentions @Circonus and #railsconf, and that includes the link.

Why are you still reading this? Go off and start tweeting for your free RailsConf pass (Conference Sessions Only).

See you in Baltimore!

Your Visitors Don’t Matter

Consider me old-fashioned, but I remember a time when an alert notification meant something. Drives failed, servers ran short on memory, or a cage monkey pulled the wrong cable at 3 A.M. Regardless of the circumstance, it demanded attention. Those were the days.

Today, operations is all about doing more with less. No more dedicated hardware or late-night maintenance windows. Everything is virtual, cloud-based, or filling up squares in the grid. Automation reigns supreme, limitless scalability at our disposal. Abstraction at its finest.

But woe unto you, the flapping anomaly.

That visitor who tried to load your website was turned away, timed out and left to wither. Poor Jane wanted to view your site. She needed to view your site. She’d already submitted her order, only to be ignored. Forgotten. Disconnected with nary a trace to route nor a cookie to favor.

Jane was a victim of a numbers game. Someone, somewhere, decided that some problems don’t matter. Which ones? Who cares? They don’t matter. And because she happened to visit when this problem reared its head, you ignored her request. Who would ever make such a silly presumption that one failure is less important than another? What criteria is used to determine the worthiness of this alert or that one? Pure random circumstance, it would appear.

Many “uptime” services and monitoring suites promote the concept of selective or flapping failures. Vendors sell these features as a convenience, ostensibly as a sleep aide. The administrator’s snooze-bar. I can’t think of any other reason that ignoring a faulty condition would be considered a good thing. Perhaps they reason that only the check is affected. If it responds after the third attempt, it was probably ok for visitors all along. Right?

It’s disappointing how many vendors embrace this broken methodology. It probably seemed innocent at a glance. But the damage has been done; recklessness has taken root. We’ve been conditioned to accept these transient malfunctions as mere operational speed bumps. Rather than address the problem, we nudge the threshold a tad higher. Throw additional nodes into the cluster. Increase capacity, while decreasing exposure.

But there is a more responsible alternative. What ever happened to purposeful, iterative corrections and Root Cause Analysis? Notifications may be annoying at times, but they serve a crucial function in a healthy production architecture. Ignored alerts lead to stagnant bugs, lost traffic and missed opportunities. Stop treating your visitors like they don’t matter. There’s no such thing as a flapping customer.