Kubernetes Autoscaling: HPA, VPA, CA, and Using Them Effectively

What Is Kubernetes Autoscaling? 

Kubernetes is an open-source platform designed to automate deploying, scaling, and managing containerized applications. Autoscaling is a feature that automatically adjusts the number of running instances of an application based on the application’s present demand. 

In the context of Kubernetes, Autoscaling can mean:

  • Adjusting the number of pods (the smallest deployable units of computing that can be created and managed in Kubernetes) in a replication controller, Deployment, ReplicaSet, or StatefulSet, based on observed metrics.
  • Adjusting the resources available to pods, for example by moving them to a node with more resources when they experience high load.
  • Adjusting the number of nodes (physical or virtual machines) to increase the overall resources available to a Kubernetes cluster.

Kubernetes provides three built-in mechanisms—called HPA, VPA, and Cluster Autoscaler—that can help you achieve each of the above. Learn more about these below.

Benefits of Kubernetes Autoscaling 

Here are a few ways Kubernetes autoscaling can benefit DevOps teams:

Adjusting to Changes in Demand

In modern applications, traffic patterns are dynamic. They can increase during peak hours and decrease during off-peak hours. With Kubernetes autoscaling, you don’t have to worry about manually adjusting the number of pods to meet this demand.

For instance, if your application experiences a sudden surge in traffic, Kubernetes autoscaling features can automatically increase the number of pods to ensure that your application can handle the additional load. Conversely, during periods of low traffic, they can reduce the number of pods to prevent resource wastage.

Cost Efficiency

Kubernetes clusters often run in the cloud, and resources in the cloud can be expensive—every instance that you run adds to your costs. Even when running on-premises, servers are a scarce resource which must be utilized to the max. Without a proper management tool, you could end up running more instances or servers than necessary, leading to higher costs.

Kubernetes autoscaling, with its ability to adjust the number of pods or nodes based on demand, ensures that you’re only using the resources you need. This leads to significant cost savings, because it eliminates unnecessary servers and minimizes unutilized resources.

Ensuring High Availability

By automatically adjusting the number of pods based on demand, Kubernetes autoscaling ensures that your application remains available even during periods of high traffic.

If a pod fails, Kubernetes autoscaling will automatically create a new one to replace it (this is known as self-healing). This ensures that your application remains available and that your users experience minimal downtime.

Improving Resource Utilization

By dynamically adjusting the number of pods or nodes based on demand, Kubernetes ensures that your resources are used efficiently. You’re not wasting resources by running too many pods or nodes during periods of low traffic, and you’re not underutilizing resources by running too few during periods of high traffic.

This efficient use of resources is not only cost-effective, but it’s also environmentally friendly. By using only the resources you need, you’re reducing your organization’s carbon footprint.

expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

In my experience, here are tips that can help you better utilize Kubernetes autoscaling:

Leverage Custom Metrics for HPA

Extend the capabilities of Horizontal Pod Autoscaler (HPA) by using custom metrics (e.g., request rate, response time) instead of just CPU and memory. Integrate with Prometheus Adapter to expose custom metrics for more precise scaling decisions.

Tune HPA Parameters Carefully

Fine-tune HPA parameters like
--horizontal-pod-autoscaler-downscale-stabilization and
--horizontal-pod-autoscaler-sync-period to optimize the responsiveness of your autoscaling actions. This can prevent unnecessary scaling actions and reduce resource churn.

Combine HPA and VPA for Balanced Scaling

Use Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) together to achieve optimal scaling. HPA can handle pod scaling based on load, while VPA adjusts resource requests to ensure pods have sufficient resources.

Implement Predictive Autoscaling

Use predictive autoscaling tools such as KEDA (Kubernetes-based Event Driven Autoscaling) to anticipate and scale for traffic spikes based on historical data and trends. This preemptive approach can help maintain performance during sudden demand surges.

Set Resource Requests and Limits Appropriately

Ensure all pods have well-defined resource requests and limits. Accurate settings help the autoscalers (HPA, VPA) make more effective decisions and maintain overall cluster performance.

Autoscaling Mechanisms in Kubernetes 

There are three main types of autoscaling in Kubernetes: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler.

Horizontal Pod Autoscaler (HPA)

HPA is a Kubernetes feature that automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization or, with custom metrics support, on some other application-provided metrics.

Implementing HPA is relatively straightforward. It requires defining the metrics to monitor, the target value for each metric, and the minimum and maximum number of pods. The HPA controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by the user.

Vertical Pod Autoscaler (VPA)

VPA, on the other hand, adjusts the CPU and memory requests of the pods, which can help in cases where the resource usage pattern of your application changes over time, or if the resource requests were initially set too high or too low.

VPA operates on the level of individual pods and can both downscale pods that are using less resources than requested, and upscale pods that need more. It consists of three components: 

  • Recommender: Monitors the current and past resource consumption and uses this data to provide recommended resource requests. 
  • Updater: Can evict pods to apply these recommendations. 
  • Admission Plugin: Applies the recommendations when pods are being created.

Cluster Autoscaler

Cluster Autoscaler, the third type of Kubernetes autoscaling, increases or decreases the size of the cluster based on the demand. It does this by monitoring the status of pods and nodes and making decisions based on that.

If there are pods that failed to run in the cluster due to insufficient resources, the Cluster Autoscaler increases the size of the cluster. Conversely, if some nodes in the cluster are underutilized for an extended period of time, and all their pods can be easily moved to other existing nodes, the Cluster Autoscaler reduces the size of the cluster.

Related content: Read our guide to Kubernetes CPU limit

Best Practices for Autoscaling in Kubernetes 

These best practices ensure you get the most out of Kubernetes autoscaling capabilities.

Handling Rapid Scaling Events

Another important practice is to be prepared for rapid scaling events. These are situations where the demand for resources suddenly spikes, requiring an immediate increase in the number of pods or nodes.

When such events occur, it’s essential that your system can scale up quickly enough to meet the demand. This requires careful tuning of your scaling policies, as well as ensuring that your underlying infrastructure can support rapid scaling.

Understanding Limits, Requests, and Quotas

Kubernetes provides mechanisms to control resource allocation at both the pod and namespace levels. Setting these appropriately is important for effective autoscaling:

  • Limits: Define the maximum amount of resources (CPU and memory) that a pod can consume. If a pod tries to exceed these limits, it can be terminated. Setting appropriate limits ensures that no pod monopolizes all available resources, which could adversely affect other applications.
  • Requests: These specify the baseline resource requirement for a pod. Kubernetes uses requests to determine the best node to place a pod based on the resources available. For example, if a pod requests a certain amount of memory, Kubernetes schedules that pod on a node where the requested memory is available. Misconfigured requests can lead to inefficient resource allocation and can affect the performance and reliability of both the specific pod and the node it’s scheduled on.
  • Quotas: Applied at the namespace level, quotas restrict the total amount of resources (like CPU, memory, or even number of pods) that can be consumed in a namespace. By setting quotas, administrators can ensure fair resource distribution among multiple users or teams sharing the same cluster.

Balancing these configurations is essential. While it’s crucial to prevent resource hogging with limits and quotas, setting them too low, or misconfiguring requests can throttle applications and impact their performance.

Dealing with Stateful Applications

Autoscaling stateful applications like databases or caching systems in Kubernetes requires special attention, because it can be tricky to persist application state in a dynamic containerized environment. To achieve it, you should use:

  • StatefulSets: Help maintain data consistency and unique identifiers for each pod. 
  • Persistent Volumes (PVs): Store data persistently even as containers are torn down and replaced. 
  • Graceful scaling: Can help maintain data integrity. 
  • Custom metrics: By reflecting the application’s workload, such as query rate for databases, these can be more informative for scaling decisions than standard CPU or memory metrics.
  • Kubernetes-native database operators: Can help manage the nuances of autoscaling databases. One example is the Kubernetes PostgreSQL Operator.

Testing is especially critical for stateful applications. Simulating failure and recovery scenarios helps ensure that autoscaling actions won’t result in data loss or corruption. Rate limiting and backoff policies can smooth out potentially disruptive scaling actions. Given the complexity of stateful applications, manual overrides can offer a safety net, allowing for human intervention when autoscaling behavior appears risky.

Handling Autoscaling in Multi-Cluster Environments

Lastly, handling autoscaling in multi-cluster environments is another essential practice. In a multi-cluster environment, you might have applications running in different clusters for reasons like high availability, disaster recovery, or geo-distribution.

In such scenarios, you need to ensure that autoscaling is coordinated across all clusters to prevent over-provisioning or under-provisioning. This might involve setting up federation or using multi-cluster management tools.

Tying Autoscaling to Komodor: A Comprehensive Guide

Enhanced Visibility: Komodor ensures that every autoscaling event is meticulously monitored and recorded. No change escapes its vigilant watch.

  • HPA Adjustments: Komodor efficiently tracks any decision made by autoscaling to modify the number of replicas.
  • Deployment Dynamics: Any alterations in deployment replicas are recorded by Komodor, providing users with a complete view of the deployment lifecycle — from the new pods’ status as an event to their full deployment.
  • Node Monitoring: Komodor isn’t just about pod scaling; it offers insights into node scaling and termination activities, ensuring you understand the broader impacts on your infrastructure.
  • Direct Control: With Komodor, users aren’t just passive observers. They’re equipped to proactively control and scale their deployments using the platform’s built-in action capabilities. It’s not just about viewing; it’s about doing.
  • Cost Optimization: Beyond the immediate benefits of scaling, Komodor provides users with valuable data that can be used to fine-tune their autoscaling strategies. This ensures an optimized approach to cost, helping organizations balance performance with expenses.

By leveraging Komodor’s capabilities, teams can navigate the complexities of autoscaling with confidence, gaining the insights and controls they need to optimize their operations effectively.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.