5 Different Monitoring Challenges Companies are Successfully Tackling Using Circonus

At Circonus, we love hearing (and sharing) how our customers are using our platform to tackle their monitoring challenges. It’s not because we want to congratulate ourselves. Rather, it reminds us that we have to continue listening to the pains that organizations — customers and non-customers alike — are having, so we can continue to enhance our platform with the capabilities they need. This is the approach we have taken since we built Circonus.

We know there are a lot of organizations who are facing these same challenges, and it can be helpful for them to hear how others are successfully tackling them. In the following, we share a few snapshots of how 5 companies turned to Circonus with 5 different monitoring hurdles. Perhaps one or more of these resonate with your company.

Challenge #1: How to consolidate monitoring and reduce monitoring costs

Many likely do not realize this, but Major League Baseball (MLB) is a complex technology company that not just broadcasts and live-streams baseball games but also real-time statistics for sports betting and fantasy league baseball. As a result, it’s collecting and monitoring a significant amount of data while aiming to provide seamless experiences for millions of consumers.

Over time, MLB’s monitoring systems multiplied and costs began to escalate significantly as its vendors increased pricing. The league decided it was time to consolidate its monitoring — particularly as it was moving quickly to Kubernetes.

MLB chose Circonus as its centralized monitoring platform for application, infrastructure, network, video, and Kubernetes monitoring. MLB is using Circonus’ histograms to monitor at the service level, which allows them to drill down and perform root cause analysis to quickly resolve problems, and they are using Circonus’ Kubernetes monitoring to identify and resolve cluster performance and health issues.

Ultimately, MLB replaced 7 different monitoring systems with Circonus and reduced their annual monitoring spend by 66%. Circonus is now MLB’s monitoring system of record, streamlining operations and creating one source of truth from which to base decisions. By monitoring service levels across all of its APIs (billions of requests), MLB ensures it delivers an exceptional real-time entertainment experience for over 100 million fans and users globally.

Challenge #2: How to improve availability and performance as global infrastructure scales significantly and rapidly

In 2020, Webex usage quadrupled as a result of the COVID-19 pandemic and its global infrastructure estate also quadrupled in size. Availability and performance became mission-critical as businesses, government, and news agencies relied heavily on video-conferencing to navigate not only the pandemic but also a hotly contested presidential election and civil unrest.

A long-time client of Circonus, Webex was able to easily model its capacity requirements and scale its monitoring operations to keep pace during this unprecedented period of growth. Circonus is deployed directly in Webex data centers and used to track critical metrics such as connection rates and user-engagement across all its infrastructure and applications.

Webex is now delivering millions of meetings a day and billions of meeting minutes monthly with confidence. It recently announced its Webex One initiative to reclaim market leadership from Zoom. Circonus now monitors some 100,000+ machines globally, giving Webex full visibility into user experience and quality of service. This exponential growth was achieved without an increase in software and support costs.

Challenge #3: How to improve visibility, analytics, and alerting in order to ensure the highest quality of service

HBO’s previous monitoring solution had difficulty scaling and provided inadequate visibility and analytics. Circonus was originally engaged to monitor the infrastructure for Game of Thrones but quickly expanded into all of the HBO platforms — now consolidated into HBO Max.

Circonus is now the system of record for all HBO monitoring. Using Circonus, HBO can track number of concurrent viewers, stream quality, player errors, stream drops, and status of player interface (HTTP) — in addition to traditional infrastructure metrics across all of its platforms and programs.

HBO now has comprehensive, end-to-end visibility and instant alerting to ensure the highest possible quality of service and can avoid or quickly resolve issues related to latency. The cable provider is also able to retain years of season-over-season performance data to ensure accurate capacity planning, control cloud costs, and proactively handle spikes in user demand with auto-scaling. Circonus smoothly scaled to handle 2X growth in 2020, and HBO plans regional expansion into South America and Europe in 2021.

Challenge #4: How to scale metrics collection and analysis while accelerating MTTR

Joyent provides mission-critical cloud services to high profile clients — most notably as the storage technology behind the Samsung Cloud. As Joyent began to deploy its services at larger and larger scale, its previous monitoring solution could not handle the number of metrics they needed to collect and measure. There was a lack of clarity, pulling metrics was challenging, and resolving performance issues was too time-consuming.

Joyent switched to Circonus and is using the platform to track incredibly detailed metrics such as CPU temperature and disk performance — scanning for anomalies and performing predictive analytics so that any issues can be proactively addressed and costly outages avoided. The solution is deployed on Joyent-owned hardware so that no telemetry data is sent off premises.

Joyent’s deployment for Samsung has grown to 15 data centers in 3 regions with some 9,000 machines, each with 30 hard drives supporting the storage needs of nearly 1 billion smartphone users globally.

Real-time analysis of over 1 million performance and availability metrics enables Joyent to ensure it is meeting its SLOs and SLAs commitments, more efficiently troubleshoot, and avoid potential penalties.

Challenge #5: How to efficiently modernize monitoring systems and optimize resources

Xandr is a demand-side platform that conducts real-time, millisecond auctions for digital advertising, so any latency dramatically impacts the effectiveness of its service. Xandr was using a very fragile Graphite/Whisper deployment that could not handle the scale or performance requirements and provided no redundancy. It was incredibly painful and time-consuming to maintain.

Xandr deployed Circonus as a seamless drop-in replacement for Graphite with integration to all the existing Grafana dashboards. Circonus is deployed directly in Xandr’s data centers as the volume of telemetry data is untenable to send over the public internet. Xandr now uses Circonus to collect and analyze 250 million metrics every minute that is used for predictive analytics, anomaly detection, fraud detection, and more. It was able to reduce its infrastructure from 104 nodes to 26 nodes and reduce its support staff, resulting in a 75%+ cost savings annually. Resources were redeployed to higher-value initiatives and the satisfaction of the DevOps team was greatly improved.

What are your monitoring pains and challenges? If, like any of these companies, your current monitoring solution is letting you down or can’t take you to where your organization is headed, contact us and we’ll share how we can help.