What is Elastic Observability?
Elastic Observability is the concept that the amount of metrics, logs, and traces collected should scale based on signals from your environment. You automatically collect more data when you need it, such as during incidents and high traffic-events, and less data when you don’t.
For example, if you’re alerted to a server experiencing increasing CPU load, start collecting metrics at 10s granularity vs your standard 60s. Just as you right-size your infrastructure by provisioning resources based on signals from metrics like CPU load, you’re right-sizing the amount of observability data you collect.
Why auto-scale observability data?
In order for a service to run reliably, it needs to be able to accommodate peak load. But to accommodate your highest peaks all the time would be really expensive. That’s why most services are architected to support some degree of scale out behavior. Why shouldn’t organizations approach observability collection strategies the same way?
Just like spinning up more servers to accommodate peak load, collecting, processing and storing more observability data costs money. But most observability strategies today are static, designed to send as much data as they can afford. This is why observability is now the second-highest source of IT spend, second only to cloud costs.
Why the current situation?
The tools needed to send back more data based on signals from your environment are basically non-existent. Observability vendors have little incentive to help you add elasticity to your collection strategy. They mostly charge by the amount of data you send them, so they have little incentive to help you send them less data.
Also, teams are concerned that not having all the data they can afford will lead to visibility gaps, which can ultimately lead to costly outages.
But Elastic Observability doesn’t mean sacrificing data quality or visibility to save money. In fact, the opposite is the case. You save money by collecting less data when it’s not needed, so you can afford to collect even more of the relevant, valuable data when you need it.
How do you get started?
As an industry, we need to move away from a one-size-fits all collection strategy. The amount of data we collect while all of our services are healthy should not be the same as when production is on fire. When we get a signal that something is wrong, that’s when we go and collect more data.
It’s with this philosophy that Circonus built Circonus Passport™, the industry’s first Eastic Observability solution.
Passport operates at the collection layer, enabling engineers to safely make changes to how agents collect data. The solution uses an innovative rules engine to receive “signals” (such as high CPU utilization) from any observability platform and then takes action based on the rules that have been established. Best of all, Passport is agent agnostic, requires no migration effort, and does not require teams to change any of their existing observability tools.
What are the benefits of Elastic Observability?
- Improve visibility & MTTR by collecting more valuable data at the most relevant times
- Make alerts more actionable by automatically collecting more logs, metrics, and traces on impacted resources the moment an alert triggers
- Reduce costs by collecting less data when it’s not needed, so you can afford to collect more relevant data when you need it.
- Gain more control over your data collection strategy by decoupling data collection from your observability platform.
The concept that context should inform how data is collected is long overdue. The cost/visibility tradeoff is unavoidable, so tooling really needs to fill in the gaps in order to make the way data is collected more intelligent. If done right, you can get better visibility at a lower cost.