Some of our users tell us that moving to Circonus from in-house operated tools was driven by the desire to outsource responsibility for availability and data safety to someone else. This might sound odd, but building highly resilient systems is hard, most engineering teams are focused on solving those hard problems for their applications and their customers, and it can be highly distracting (even flat-out wasteful) to focus on doing that for telemetry data. Telemetry data has very different storage, recall, and consistency requirements than most typical application data. At first, this might seem easier than your typical database problem. Once you’ve struggled first-hand with the combination of availability and performance requirements on the read path (for both failure detection and operational decision making) and the intensely painful, disk saturating write path, it becomes clear that this is a hard beast to wrestle.
When placing this responsibility in someone else’s hands, you could simply ignore the gruesome details and assume you’re in good hands… but you should expect an explanation of just how that’s all accomplished. At Circonus, we care very much about our customers’ data being correctly collected and reported on, but that would be all for naught if we couldn’t safely store it. Here’s how we do it…
The first thing you should know is that we’re committed to ZFS. If you know about ZFS, that should make you feel better already. ZFS is the most advanced and production-ready file system available; full stop. It supports data redundancy and checksumming, as well as replication and snapshots. From an operational perspective, putting data on ZFS puts a big, fat smile on our faces.
ZFS checksumming means that we can detect when the data we wrote to disk isn’t the data we’re reading back. How could that be? Bitrot and other errors on the read or write path can cause it. The bottom line is that at any reasonable scale, you will have bad data. If you are storing massive data on a filesystem that doesn’t protect the data round trip the way ZFS does, you may not know (or notice) errors in your data, but you almost certainly have them… and that should turn your stomach. In fact, we “scrub” the filesystems regularly just to check and correct for bitrot. Bitrot doesn’t separate us from our competitors, noticing it does.
ZFS Snapshots and Rollback
ZFS also supports snapshots. When we perform maintenance on our system, change on-disk formats, or apply large changesets of data, we use snapshots to protect the state of the overall system from operator error. Have we screwed things up? Have we broken databases? Have we corrupted data? Yes, you betcha; mistakes happen. Have we ever lost a piece of data or exposed bad data to customers? No, and we owe that largely to ZFS snapshots and rollback.
ZFS Device Management
On top of the data safety features baked into ZFS for modern-sized systems, ZFS supports software data redundancy across physical media. This might sound like RAID, and in concept it is, but the implementation being baked through the whole filesystem block layer allows for enormous volumes with speedy recovery and rebuild times. So, getting into the nitty gritty: all data that arrives on a node is stored in at least two physical disks (think RAID-1). Any disk in our storage node can fail (and often several can fail) without any interruption of service, and replacement disks are integrated with online rebuild with zero service interruption.
Our magic mojo is provided by our timeseries database, called Snowth. Snowth uses a consistent hashing model to place arriving data on more than one node. Our Snowth clusters are configured for three write copies. This means that every piece of data we collect is stored on three machines. With three copies of the data, any two machines in the cluster can fail (or malfunction) and both writes and reads can continue uninterrupted.
Since data is stored on two physical drives per node and stored redundantly on three nodes, that means each data point you record in Circonus lives on six disks.
While disk failure and node failure can (and do) happen, datacenters can also fail (sometimes catastrophically due to fire or natural disaster). Each measurement you send to Circonus follows two completely independent paths from the broker to each of two datacenters. Basically, each production datacenter acts as a subscriber to the metrics published by a broker. The Snowth clusters in the two datacenters are completely independent. So while your data lives on only six disks in one datacenter, rest assured that it resides on six other disks in the other datacenter. We’re up to 12 disks.
Safe and Sound
How serious are we about data safety? Each measurement lives on 12 disks in 6 different chassis in 2 different datacenters, protected from bitrot, controller errors, and operator errors by ZFS. So yeah, we’re pretty serious about it.
How safe is your data? Maybe it’s time to ask your current vendor. You have a right to know.