Business Benefits at a Glance
- Real-time machine data analysis on availability and performance to meet SLAs/SLOs and provide optimal customer experiences
- Long-term historical data storage and analysis to improve operational efficiencies
- Rich APIs that enable customization and flexibility
Joyent provides technology and services for managing private cloud infrastructure, primarily for retail companies, and in particular Samsung Mobile and Samsung cloud services. It has data centers in four regions around the world and multiple availability zones. Its infrastructure supports about 10,000 machines and around an exabyte of data, and it monitors about a million infrastructure metrics.
Traditional Monitoring Tools Not Sufficient for Joyent
Before using Circonus, Joyent’s solutions for monitoring the performance of its infrastructure were too limiting. There was a lack of clarity into the data, pulling metrics was challenging, and identifying bottlenecks to resolve performance issues was too time-consuming. Its existing traditional monitoring solution just wasn’t enough, nor were others it was evaluating.
“We started to deploy our object storage product at larger and larger scales, and we knew that traditional monitoring was not going to handle the number of metrics that we’d be collecting to measure things like the availability and performance of a large scale deployment,” said Luke Jarymowycz, Cloud Operations Manager at Joyent. “We wanted a solution that would give us the analysis we needed, as well as the ability to scale. We also wanted something that we could deploy quickly. Circonus checked off these boxes for us.”
Circonus’s agents and push technology were also significant deciding factors to Joyent choosing the platform over others.
“Circonus’s push model for sending data back into the core of our system was definitely a huge security benefit for us. There’s no direct communication, so we’re not opening up the firewalls. The ability to do this fast is very effective, which doesn’t exist with other products.
We also liked the lighter weight agent, especially when you compare that to some of the more traditional monitoring solutions. There’s not necessarily strong vendor support from any of the monitoring providers, so having that flexibility in the agent itself to create any of the plugins that you want was another big bonus.”
Machine data analysis to meet SLOs/SLAs, efficiently troubleshoot, and deliver the performance customers expect
Joyent is using the Circonus platform to collect and analyze millions of metrics of infrastructure machine data, including up to 70,000 metrics that can form an alert, particularly information based on thresholds.
“From a customer’s point of view, it’s all about availability and being as fast as possible. So on the service side, we’re using our availability metrics to ensure we’re hitting all of the SLOs and SLAs we have in place. With Circonus, we have immediate analysis on availability and performance, so we’re looking at things like latencies, how fast requests are being responded to and how many requests are being responded to. We’re able to do this in an instant, whereas some of the products we had looked at wouldn’t necessarily allow us to do that as fast and as easily.”
And scalability is no longer a concern for Joyent. Circonus’s IRONdb makes it easy to store and analyze unlimited volumes of machine data, easily handling billions of metric streams.
“We can now add as many services and machines as we want and still get the data we need.”
The ability to quickly create dashboards of the metrics its collecting has also enabled Joyent to more efficiently make sense of all its data and troubleshoot any performance issues.
“We’re putting hundreds of thousands of metrics quickly onto dashboards and graphs that our network operations team can easily view, refine as necessary, and validate that the product is working as the customer expects.”
Historical data analysis for insights to improve operations
Circonus’ patented histogram technology enables highly compressed storage of data at rest, allowing Joyent to access years of data without high storage fees.
“The real big benefit is the meantime to repair. We can keep years of metrics and go back and look at data that gives some powerful operational insights. Do we need more people in the data center? Do we need to be ready to replace hardware sooner? Do we have enough disk drives? When do we need to replace machines? We need to know what our failure rates look like and how fast we actually have to change those in order to get new ones online. Other offerings I’ve used and seen do not allow this type of long term storage without doing a lot of extra administrative work. We get this valuable historical data analysis from Circonus without any huge uplift on our end.”
Rich and powerful API for flexibility
The Circonus API enables Joyent to customize its monitoring and alerts, and quickly make needed adjustments as its architecture evolves.
“The Circonus API is rich and featureful. It allows us to set up new metrics for collection without logging into a browser, or scrape data back out for just analyzing without having to go and click around. The API being so powerful as it is has been a big benefit internally, as well as externally for any customers that we need to share data with.”
In the near future, Joyent plans to leverage the analysis provided by the Circonus platform to execute more comprehensive predictive analytics in an effort to continually improve operational efficiencies, performance, and the customer experience.