In one of our recent posts, we provided insights on how companies can move from traditional, basic infrastructure monitoring to more advanced monitoring. For many companies, basic monitoring and alerting is just not enough anymore. Staying competitive by deploying new features faster while providing the seamless experiences customers expect requires a more sophisticated monitoring analytics strategy.
The reality is that there’s more online activity and data being generated than ever before, and customers expect perfection. One performance issue can result in a social media firestorm and lost customers. The stakes are high.
We hear from a lot of companies that traditional tools cannot provide the granularity required that allows them to understand their data and address issues immediately. These tools also cannot handle the scale of metrics — both historical and in real-time.
These were some of the pains our customer Joyent was facing. When traditional monitoring solutions could not handle the volume of data Joyent was scaling to, nor provide the depth of analysis in real-time required to troubleshoot and improve performance, the managed cloud services provider turned to the Circonus infrastructure monitoring and analytics platform.
Traditional monitoring lacks scalability and data clarity
Joyent provides technology and services for managing private cloud infrastructure, primarily for retail companies, and in particular Samsung Mobile and Samsung cloud services. It has data centers in four regions around the world and multiple availability zones. Its infrastructure supports about 10,000 machines and around an exabyte of data, and it monitors about a million infrastructure metrics.
Before using Circonus, Joyent’s solutions for monitoring the performance of its infrastructure were too limiting. There was a lack of clarity into the data, pulling metrics was challenging, and identifying bottlenecks to resolve performance issues was too time-consuming. Its existing traditional monitoring solution just wasn’t enough, nor were others it was evaluating.
“We started to deploy our object storage product at larger and larger scales, and we knew that traditional monitoring was not going to handle the number of metrics that we’d be collecting to measure things like the availability and performance of a large scale deployment,” said Luke Jarymowycz, Cloud Operations Manager at Joyent. “We wanted a solution that would give us the analysis we needed, as well as the ability to scale. We also wanted something that we could deploy quickly. Circonus checked off these boxes for us.”
Using real-time analysis to meet SLAs/SLOs, efficiently troubleshoot, and deliver the performance customers expect
Joyent is using the Circonus platform to collect and analyze millions of metrics of infrastructure analytics, including up to 70,000 metrics that can form an alert, particularly information based on thresholds.
“From a customer’s point of view, it’s all about availability and being as fast as possible. So on the service side, we’re using our availability metrics to ensure we’re hitting all of the SLOs and SLAs we have in place. With Circonus, we have immediate analysis on availability and performance, so we’re looking at things like latencies, how fast requests are being responded to and how many requests are being responded to. We’re able to do this in an instant, whereas some of the products we had looked at wouldn’t necessarily allow us to do that as fast and as easily.”
And scalability is no longer a concern for Joyent. Circonus’s IRONdb makes it easy to store and analyze unlimited volumes of machine data, easily handling billions of metric streams.
“We can now add as many services and machines as we want and still get the data we need.”
The ability to quickly create dashboards of the metrics it’s collecting has also enabled Joyent to more efficiently make sense of all its data and troubleshoot any performance issues.
“We’re putting hundreds of thousands of metrics quickly onto dashboards and graphs that our network operations team can easily view, refine as necessary, and validate that the product is working as the customer expects.”
Historical data analysis for insights to improve operations
Circonus’ patented histogram technology enables highly compressed storage of data at rest, allowing Joyent to access years of data without high storage fees.
“The real big benefit is the meantime to repair. We can keep years of metrics and go back and look at data that gives some powerful operational insights. Do we need more people in the data center? Do we need to be ready to replace hardware sooner? Do we have enough disk drives? When do we need to replace machines? We need to know what our failure rates look like and how fast we actually have to change those in order to get new ones online. Other offerings I’ve used and seen do not allow this type of long term storage without doing a lot of extra administrative work. We get this valuable historical data analysis from Circonus without any huge uplift on our end.”
Rich APIs enable customization and flexibility
Many traditional monitoring tools are rigid — telling you what to monitor and how. In today’s world of computing and constant changes, this type of “canned” solution doesn’t work. Joyent likes the flexibility Circonus offers. The Circonus API enables Joyent to customize its monitoring and alerts, and quickly make needed adjustments as its architecture evolves.
“The Circonus API is rich and featureful. It allows us to set up new metrics for collection without logging into a browser, or scrape data back out for just analyzing without having to go and click around. The API being so powerful as it is has been a big benefit internally, as well as externally for any customers that we need to share data with.”
“We also liked the lighter weight agent, especially when you compare that to some of the more traditional monitoring solutions. There’s not necessarily strong vendor support from any of the monitoring providers, so having that flexibility in the agent itself to create any of the plugins that you want was another big bonus.”
Time for a Change
If you’re like Joyent and are frustrated with the limitations of your current monitoring strategy, consider implementing a more advanced approach to infrastructure monitoring analytics. It can be a gradual process, and the benefits you’ll achieve — such as faster problem resolution, preventing outages, and more confidence in your decision-making — will make a significant impact on your business.