The Blog. From Circonus

Postmortem: 2017-04-11 Firewall Outage

Postmortem: 2017-04-11 Firewall Outage

The Event At approximately 05:40AM GMT on 4/11/2017, we experienced a network outage in our main datacenter in Chicago, IL. The outage lasted until approximately 10:55AM GMT on the same day. The Circonus SaaS service, as well as any PUSH based checks that use the...

Documenting with Types

I've said this before: elegant code is pedagogical. That is, elegant code is designed to teach its readers about the concepts and relationships in the problem domain that the code addresses, with as little noise as possible. I think data types are a fundamental tool...

Post-Mortem 2017.1.12.1

TL;DR: Some users received spurious false alerts for approximately 30 minutes, starting at 2017-01-12 22:10 UTC. It is our assessment that no expected alerts were missed. There was no data loss. Overview Due to a software bug in the ingestion pipeline specific to...

Show Me the Data

Avoid spike erosion with Percentile - and Histogram - Aggregation It has become common wisdom that the lossy process of averaging measurements leads to all kinds of problems when measuring performance of services (see Schlossnagle2015,  Ugurlu2013,  Schwarz2015,...

ACM – Testing a Distributed System

I want to sing the praises of one of our lead engineers, Phil Maddox, for authoring a very interesting paper, Testing a Distributed System, which was published in Communications of the ACM, Vol. 58 No. 9. A brief excerpt follows: "Distributed systems can be especially...

Hallway Track: The Future of Monitoring

I’ve been in this “Internet industry” since around 1997. That doesn’t make me the first on the stage, but I’ve had a very wide set of experiences: from deep within the commercial software world to the front lines of open source and from the smallest startup sites to...