Advanced Monitoring and Analytics: An Interview with Mission Critical Magazine

This post includes excerpts from a recent video interview Circonus CEO Bob Moul had with Mission Critical Magazine.

Circonus CEO Bob Moul recently spoke with Amy Al-Katib, Editor-in-Chief of Mission Critical Magazine, about how organizations can begin to implement more sophisticated infrastructure monitoring analytics like predictive analytics and maintenance. This is the second time in the past few weeks they spoke about how the sudden surge in online services brought on by the COVID-19 pandemic has exposed weaknesses in the state of monitoring within many organizations. As a result, more businesses are looking to move to more advanced monitoring and analytics.

But where do you start? How do you start? Businesses know they want more value from their monitoring efforts, but figuring out how to move forward can feel daunting. In the interview, Moul offers some advice on what organizations should consider as they look to implement more advanced monitoring analytics, and what types of benefits they can realize as a result.

Below is a snapshot of the interview, and here’s the full video.

Al-Katib: There’s a lot of opportunities for businesses right now to employ more advanced analytics like predictive analytics. Can you elaborate on this and how these analytics can be used?

Moul: I like to think about monitoring and analytics in three buckets. In the first bucket, Monitoring and Alerting, which I think is where the vast majority of us are, we want to get our arms around understanding questions like, “What do we have? Is it working well? Is something broken? Please alert me and help me fix it quickly to get it back up and running.” Many are finding weakness with their monitoring, and this is where they are beginning their journey.

In the second bucket, organizations move towards Data-Driven Operations. This is when they really start to think about implementing DevOps and SRE, and getting sophisticated around things like error budgets and SLOs/SLIs/SLAs, and tying those metrics back to business success and business performance.

The third bucket, Advanced Analytics, is where you have things like predictive analytics, predictive maintenance, real system observability, and device optimization. There’s all sorts of things you can do here, but you’re only as good as your data. What’s essential is the ability to ingest all that data at a frequency that is meaningful. If you’re going to do predictive analytics, you can’t sample metrics every so often – you really need to be able to get that data as it’s streaming off of a device, retain it, and then analyze it.

If organizations can do this, there are some pretty amazing things they can do. For example, they can know that a drive is about to fail some time in the next six hours and proactively replace this before it causes an issue. In the IoT world, organizations can model device performance behavior and compare devices across multiple deployments. But again, this is all based on the ability to get the data at the frequency it’s being emitted.

Al-Katib: How can you ensure that teams are utilizing this technology in the right way? Are there any best practices that you can share?

Moul: Situations vary, but my advice would be to measure what matters and figure out what the thresholds are that matter because there’s nothing worse than getting too many false alarms or too many false positives. It just becomes noise. If you’re investing in technology, one of the key abilities is to handle the data volume – to collect that data at volume and analyze it as it’s streaming. The ability to do anomaly detection in real-time is critical.

I think important to note here is that as organizations evaluate technology platforms to support more advanced analytics capabilities, a feature they should look for is the ability to collect telemetry data from a number of different systems and correlate that data. So, for example, if you see a surge somewhere, you can clearly identify the event causing it.

Al-Katib: Are there opportunities for existing facilities to slowly start to incorporate predictive analytics into their strategies?

Moul: Organizations can implement more advanced monitoring analytics like predictive analytics slowly. It doesn’t have to be all or nothing, and it doesn’t have to be all at once – you can migrate into it. For example, you can begin collecting the analytics, but not yet have the processes in place to act on them.

I think it’s important to understand that this does not have to be a hammer looking for a nail. So rather than ask what you can do with predictive analytics and where to use it, first determine what you’re trying to achieve as a business and what metrics move this needle the most. For instance, if you’re an eCommerce site, what are the top three to five business goals, and then break that down into the next tier of what metrics to keep track of.

Like what you’ve read so far and want to hear more? Check out the full video interview.