Study: The Complexities of Kubernetes Drive Monitoring Challenges and Indicate Need for More Turnkey Solutions

Introduction

We recently surveyed 200 Kubernetes operators about their Kubernetes deployments, including their top challenges and goals as it relates to Kubernetes overall as well as monitoring specifically. Why? We wanted to better understand what capabilities were essential to our monitoring solution that could inform our product roadmap and better equip organizations to tackle their Kubernetes challenges as well as fulfill their Kubernetes goals. Time and again we heard organizations state that they couldn’t fully extract the value of their Kubernetes deployment. Realizing that monitoring is a key component of this, we set out to identify, in a data-driven way, what issues Kubernetes operators face that our solution could help with. In this article, we share the results of this study and what they indicate organizations should consider as they evaluate or reevaluate their Kubernetes monitoring strategies and solutions. The challenges of Kubernetes directly impact what you should look for in a monitoring solution, and the solution you choose will have direct implications for achieving Kubernetes success.

Research Results

Kubernetes Complexities Drive Observability Challenges

Survey respondents were asked what top two challenges their organization faced in implementing/managing Kubernetes. The top four responses included: finding, attracting, and retaining qualified talent to assist with Kubernetes at 66%; managing the Kubernetes infrastructure and ecosystem at 58%; re-working legacy applications to take advantage of Kubernetes benefits at 53%; and monitoring the performance and health of their Kubernetes clusters at 34%.

When asked to identify the top two performance or health-related issues that have affected their Kubernetes clusters the most, a little more than half of respondents said resource contention for clusters/nodes/pods and slightly less than half said deployment problems. Another 45% stated that auto-scaling challenges were a top issue, while another 39% said crash loops and job failures.

Notably, when asked their biggest challenge to fixing performance and health issues, nearly half of respondents (47%) noted difficulties or uncertainties in collecting all the right metrics. A quarter selected “Identifying the root cause of issues” as the most challenging, while another 25% stated “correlating alerts to issues.”

Given these challenges, it’s not surprising that 60% of survey participants said a monitoring solution that provides guided remediation would be very valuable. Nearly 40% said this would be somewhat valuable, with less than 1% answering not valuable.

Speed and Quality are Top Kubernetes Goals

When asked to identify the top two reasons their company decided to implement Kubernetes, the majority answered to improve software quality and to speed deployment of new products and features, 64% and 53% respectively. Another 36% said to improve costs with more open-source technologies, while 25% said to improve application performance.

Notably, 72% of survey participants stated that for the most part, they have been able to achieve the Kubernetes benefits they wanted, while 16% said they somewhat have and 12% said that it’s too early to tell.

Asked to select their top strategic goals for Kubernetes in the year ahead, the top four responses were improving enterprise adoption and developing Kubernetes as a Service at 22% each, and security/governance and monitoring/observability each selected by nearly 20% of respondents.

Key Takeaways

The rapid growth and adoption of Kubernetes is outpacing the rate at which people are able to be trained and gain experience with the technology, leading to a substantial skills gap. IT professionals are being tasked with deploying, managing, and monitoring Kubernetes deployments, but becoming knowledgeable on this technology, which has layers upon layers of abstractions, takes considerable time. Added to this is the fact that the Kubernetes ecosystem is constantly changing. These complexities are driving challenges around hiring and keeping up with changes, and they’re also driving monitoring-specific challenges like knowing which metrics to collect and resolving performance issues. Given these hurdles and the desire for guided remediation, it’s clear that monitoring solutions need to remove the burden from the user to know all the ins and outs of Kubernetes observability and instead have best practices already built-in that can ease, speed, and improve monitoring – capabilities that would also support organizations’ top Kubernetes goals.

Insights: Monitoring Solutions Must Address Kubernetes Complexities

What does this research indicate for when considering monitoring solutions? The right Kubernetes monitoring solution should address the key challenges and goals of Kubernetes as indicated by respondents, including:

Managing the Kubernetes infrastructure and ecosystem
Finding qualified Kubernetes talent
Difficulties or uncertainties in collecting all the right metrics for monitoring cluster health and performance
Quickly identifying and solving issues with better alert correlation, root cause analysis, and guided remediation
Improve software quality, cost, and application performance
Speed deployment of new products and features

Managing the Kubernetes infrastructure and ecosystem

Kubernetes is constantly changing. There’s a new release once every six weeks or so — releases which often include significant (and sometimes breaking) changes. This can create a ripple-effect throughout the Kubernetes ecosystem, with other tools and technologies now needing to adjust as well. The goal of a good Kubernetes monitoring solution should be to keep up with and encapsulate these changes in a way that still provides a consistent, reliable, and repeatable experience for the user — thus removing the burden from them.

Finding Kubernetes experts and identifying and solving issues

If you’re monitoring Kubernetes, it’s your job to identify problems, the cause of failure, and the steps to quick remediation. This requires developing appropriate domain knowledge, including a discrete set of pathological failures and prescriptive responses to each of them. But when you look at the Kubernetes skills gap in the market today, this sort of knowledge is extremely rare.

An effective Kubernetes monitoring solution should address this skills gap by providing turnkey, pre-configured Kubernetes capabilities for identifying and remediating recurrent, specific failures seen in Kubernetes deployments — like crash loops, job failures, CPU utilization, etc. Users should not need to figure out which of these they need to monitor and how. The solution should make you aware of the problem, but not require time-consuming learning and deep analysis to track it, deal with it, and ensure it doesn’t happen again. These capabilities should also be purpose-built for Kubernetes – so metrics and analysis are specific to Kubernetes.

For example, at Circonus, we provide pre-configured health dashboards and alerts for the 12 common health conditions that afflict clusters. Automatic alert rules are based on health and performance criteria specific to Kubernetes, and step-by-step remediation instructs operators on how to fix common problems. Organizations can therefore immediately automate their monitoring and have peace of mind that they will be aware of any significant issues and how to address them.

Several traditional IT monitoring tool providers have also introduced Kubernetes monitoring solutions. These solutions, however, do not provide turnkey, purpose-built capabilities. As a result, organizations are required to do more alert tuning and spend considerable time identifying problems, what’s causing them, and how to fix them.

Collecting the right metrics for monitoring cluster health and performance

Kubernetes can generate millions upon millions of new metrics daily. This can present two big challenges. First, many conventional monitoring systems just can’t keep up with the sheer volume of unique metrics needed to properly monitor Kubernetes clusters. Second, all this data “noise” makes it hard to keep up with and know which metrics are most important.

A comprehensive Kubernetes monitoring solution must have the ability to handle all of this data, as well as automatically analyze, graph, and alert on the most critical metrics to pay attention to. This way, you know you’ve collected everything you need, filter out the unnecessary data, and then automatically narrow in on the most relevant data. As a result, you can save substantial time and rest assured everything is working as it should.

Visibility into resource utilization to improve costs and application performance

Overprovisioning clusters is a common problem in Kubernetes. In fact, most organizations likely have a lot of waste running in their clusters that they’re unaware of. A monitoring solution that provides insights into resource utilization and key application performance metrics is key to preventing this.

What’s challenging to understand is how much CPU and RAM your pods are requesting, versus how much are they actually using. It’s important to be able to surface this information and make decisions about spend based on what you’re actually using.

Also, if you’re monitoring just Kubernetes and you’re not gaining observability into the applications that are being run there, then you’re only seeing half the picture. In addition to understanding the amount of CPU, memory, network and storage that each of the pods are taking up, you also want the underlying application metrics that matter to your organization, whether that’s requests per second, new users per minute, etc.

Most organizations overestimate the resources needed to keep their clusters running optimally. If you do a bit more digging into resource utilization metrics, you might see that you’re not using the resources you thought you were — so there’s a huge opportunity for cost savings.

Speed deployment of new features and products

While there are several open-source Kubernetes monitoring solutions, they require you to create and install several individual components before you can meaningfully monitor your cluster. And even if you are able to do this well and efficiently to start, monitoring is an iterative process, so you will need to constantly make adjustments.

Speeding deployment is a top reason organizations chose to use Kubernetes, and monitoring solutions that require time-consuming set-up and analysis only hinder this. That’s why your Kubernetes monitoring solution should be quick to install. For example, Circonus can be up and running and immediately populating graphs with data in less than 10 minutes.

Survey Methodology

Circonus partnered with a research firm to conduct an online survey in September of 2020. We gathered 200 responses from U.S.-based survey participants who are actively involved in operating Kubernetes clusters at their organization. 54% of the respondents are at organizations who are 1-2 years into their Kubernetes journey, while 34% are at organizations who have been operating clusters for three years or more and 13% at organizations who are less than a year into their Kubernetes journey.

Survey participants are at organizations with the following number of employees:

Less than 500 (2.5%)
500-1000 (32%)
1,001-5,000 (38%)
5,001-10,000 (16%)
10,000 and above (25; 13%)

Survey participants run their Kubernetes clusters in the following:

AWS/EKS (63%)
Google/GKE ( 59%)
Azure/AKS (53%)
Self-hosted (6%)

Survey participants deploy software to Kubernetes at the following frequency:

Multiple times per day (20%)
Daily (42%)
Weekly (31%)
Monthly (8%)

Final Thoughts

The inherent complexities of Kubernetes drive the need for turnkey, purpose-built Kubernetes monitoring solutions. Only with a comprehensive yet automated, easy to use monitoring platform can organizations overcome challenges and fulfill their Kubernetes goals to really unlock the full value of their Kubernetes deployments. Kubernetes monitoring is already a complex, multi-step process — your monitoring solution should ease this rather than add to it.

Learn more about the free edition of Circonus’ Kubernetes monitoring platform and how it can help your organization more easily maximize the benefits of Kubernetes.

Get blog updates.

Keep up with the latest in telemtry data intelligence and observability.

Subscribe Now

Study: The Complexities of Kubernetes Drive Monitoring Challenges and Indicate Need for More Turnkey Solutions

Introduction

Research Results

Kubernetes Complexities Drive Observability Challenges

Speed and Quality are Top Kubernetes Goals

Key Takeaways

Insights: Monitoring Solutions Must Address Kubernetes Complexities

Managing the Kubernetes infrastructure and ecosystem

Finding Kubernetes experts and identifying and solving issues

Collecting the right metrics for monitoring cluster health and performance

Visibility into resource utilization to improve costs and application performance

Speed deployment of new features and products

Survey Methodology

Final Thoughts

Related Posts

Data Shows Outage Time & Costs are Increasing – 3 Solutions You Should Consider

Suffering from high log costs? Too much log noise? Finally, a solution for both.

Telemetry Intelligence is the Next Generation of Monitoring & Observability

Get blog updates.

Platform

Use Cases

Help

About

Get Blog Updates