Blog

Kubernetes Resource Management: StormForge’s Machine Learning Approach

by: Joanne Chu / June, 12 2024

Since Google released it to the open-source community ten years ago, Kubernetes has quickly become a cornerstone technology for orchestrating and managing software containers and microservices. According to a Cloud Native Computing Foundation (CNCF) survey, Kubernetes is used in 96% of global businesses, and its adoption rate is not slowing down.

Despite its widespread adoption and undeniable benefits, Kubernetes poses significant challenges in resource and cost management. The inability to monitor organizational usage or optimize a cluster’s resource utilization and allocation at scale leads to a staggering amount of cloud waste—about 47% of companies’ cloud budgets, as revealed by a StormForge survey. These challenges are not just hurdles, but pressing issues that demand immediate attention.

CloudBolt solutions are used by FinOps teams worldwide that strive to maximize the value of their cloud investments. Through our extensive interactions with these teams, we’ve witnessed the intricate complexities and critical challenges of managing costs within Kubernetes environments—challenges that conventional tools struggle to address effectively.

Recognizing the need for a transformative solution, we formed a technical partnership with StormForge earlier this year. By combining StormForge’s intelligent machine learning capabilities with CloudBolt’s Augmented FinOps offerings, this collaboration offers a powerful solution that enables users to manage Kubernetes resources with precision and autonomy.

Understanding Traditional Kubernetes Resource Management

At its core, Kubernetes utilizes a set of mechanisms to control how a cluster allocates and consumes resources. A big part of this system is the concepts of CPU and memory ‘requests’ and ‘limits,’ which allow users to isolate container resources. Requests guarantee that a container gets a certain amount of resources, helping Kubernetes schedule pods effectively across nodes. Limits prevent a container from consuming more than its fair share of a resource, which can affect other containers running on the same node.

Despite its potential, Kubernetes often falls short in efficiency largely due to the complexity of managing resource requests and limits. Here, we explore three common approaches to Kubernetes resource management, each with its own set of drawbacks, highlighting the need for a more intelligent, automated solution.

Non-specification of requests

While most developers know that CPU and memory requests exist in Kubernetes, many are unaware of how to set them carefully or why it is crucial. Instead, they usually focus on simply getting their applications running. This approach leads to an entire host of performance and reliability problems such as unstable environments where applications do not receive the resources they need, causing crashes or slowdowns during peak loads.

One-size-fits-all approach

Organizations that experience the problems resulting from the first approach will then adopt a one-size-fits-all approach, which simplifies management in the short term but causes significant inefficiencies in the long term. Such an approach fails to consider the unique requirements of different applications or workloads. Developers, whose primary concern is the performance of their applications, tend to request more compute and memory than necessary which can quickly lead to significant overspending.

Manual tuning of workloads

Finally, organizations that understand the need to set appropriate values for each cluster will tune their Kubernetes workloads manually. Doing so, however, requires a large amount of engineering time and resources to individually assess each application or service and meticulously adjust its resource needs. This process is particularly problematic at scale and must happen continuously, as resource needs change over time. It cannot be a set-it-and-forget-it solution, which makes it both time-consuming and prone to inefficiencies.

Machine Learning for Kubernetes Resource Management

StormForge and CloudBolt’s joint solution offers a new transformative approach to automating and optimizing Kubernetes resource management in real time. With its advanced machine learning, StormForge analyzes observability data and makes recommendations for container CPU and memory settings to optimize resource consumption and ensure cost efficiency and application performance.

How it works

StormForge’s solution integrates seamlessly into existing Kubernetes environments using a straightforward one-click installation process. Upon installation, a comprehensive analysis of observability data from your Kubernetes clusters begins. Here’s a step-by-step breakdown:

Data Ingestion: StormForge ingests metrics from the Kubernetes cluster (kube-state-metrics and cadivsor). This includes CPU usage, memory demands, and other critical performance metrics.
Analysis and Recommendations:

Machine Learning Model Training: Using historical data, StormForge trains machine learning models to understand patterns and accurately predict future resource needs.
Resource Optimization Recommendations: Based on the analysis, the system generates recommended values for CPU and memory, focusing on optimal resource allocation for cost and performance.

Automatic Deployment:

Code Integration: Below is an example of how a Kubernetes deployment YAML might be automatically adjusted based on StormForge’s recommendations:

apiVersion: apps/v1
kind: Deployment
Metadata:
  name: nginx-deployment
Spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:

name: nginx

        image: nginx:1.14.2
        resources:
          requests:
            cpu: 100m  # Adjusted from initial higher value
            memory: 200Mi  # Optimized based on usage analysis
          limits: 
            cpu: 200m
            memory: 400Mi

Automated Patching: Recommendations are applied directly to deployments via automated patches, ensuring that configurations are always optimal.

Control and Customization:

Adjustment Control: Administrators can control how aggressively the machine learning model optimizes resources, balancing cost savings with performance needs. Users can prioritize cost, reliability, or a balance of the two, allowing for customization based on specific operational goals and constraints. For example, while applying more frequent recommendations is better for savings and performance, some users choose to deploy changes less often to prevent pod churn. Users can create bounds to control min and max recommendations for requests and limits.
Feedback Loop: Continuous feedback from ongoing operations refines the models, ensuring the recommendations improve over time.

Outcomes

The StormForge solution integrated into CloudBolt’s Augmented FinOps platform automates Kubernetes resource management, offering a scalable, machine learning-based approach that outperforms traditional methods in every aspect. It’s not just easy to use and continuously automatic, but also highly accurate. Customers can expect to save an average of 50% with StormForge, a significant amount considering the potential cost of Kubernetes for medium to large enterprises. As Mark Piersak, U.S. Bank vice president of container platform solutions, attests, the value of StormForge is undeniable: “The overall net savings on capacity has given us tremendous cost savings.”

Conclusion

As Kubernetes’ use continues to rise, clear cost visibility and accurate allocation control are more important than ever. StormForge and CloudBolt’s partnership ensures that Kubernetes clusters, along with the rest of the cloud, are not only clearly factored into cloud spending but optimized for maximum cloud ROI. Sign up for a demo to see the solution in action!

StormForge and CloudBolt will be at the FinOps X Conference in San Diego on June 19 to 22. Visit us at booth G8 to learn more about our solution!

Related Blogs

Ensuring Long-Term Success in Automation: Continuous Improvement and Strategic Alignment [Run Phase – Part 3 of 3]

Throughout this series, we’ve emphasized that automation is more than just technology—it’s about the people, processes, and principles creating a…

Navigating the Human Side of Automation: Culture and Change Management [Walk Phase – Part 2 of 3]

In our previous post, we explored how to lay the groundwork for automation by focusing on the essential processes and…

Building the Foundations of Automation: The People and Processes Behind Success [Crawl Phase – Part 1 of 3]

Automation is often synonymous with technology—tools, software, and platforms that promise to streamline operations and boost efficiency. However, successful automation…

CloudBolt Software Listed in AWS “ICMP” for the US Federal Government

The Future of Cloud Cost Management and Optimization is Here with CloudBolt

CloudBolt Continues to Deliver on Augmented FinOps Vision: CNA, CloudBolt Agent, and Tech Alliance Program

Ready to Run: A Guide to Maturing Your FinOps Automation

Forrester names CloudBolt a Strong Performer for Cloud Cost Management and Optimization

Focus on FinOps: The alignment paradox