Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
A microservices architecture focuses on creating a high number of independent services. Each service is designed to be self-contained to handle a single business context. This loosely coupled approach enables each microservice to have its respective source code base, development language, and team. In addition, they can be packaged as small container images, deployed to clusters, updated, and scaled independently.
Unlike monolith applications that have all services packaged into a single extensive application, you can now scale the individual microservices experiencing heavy use. To quickly create, deploy, and manage microservices, orchestration platforms like Kubernetes have also emerged in the past decade.
This blog will discuss cloud-native modern scaling methods for microservices and their cloud infra, as well as show you how to use them correctly.
Microservices are packaged as containers and run on virtual machines or bare metal servers. When applications get more popular with new users or other applications connecting to them, their resource usage also increases. However, servers are configured with limited CPU, memory, and storage, and when these resources are used up, scaling is the solution.
There are two mainstream methods for increasing the number of available resources: vertical and horizontal scaling.
Vertical scaling is the easiest method. It adds more resources to the server, such as new CPUs, memory units, or hard disks. If adding resources is not an option, moving the application to a larger server is also classified as vertical scaling.
Horizontal scaling is the novel and more appropriate scaling for microservices where it adds new application instances to the stack. The load is distributed over the old and new instances so each application instance continues living within the resource limits.
Microservices run on the cloud or on-premises infrastructure, where scaling is undertaken for the applications, clusters, and infrastructure. Kubernetes is the de facto platform for deploying containerized microservices. The following sections will dive into different scaling options with Kubernetes and microservices in a cloud-native modern world.
When applications get busier and consume more and more resources, it is possible to scale the microservices and then the cluster with the following approaches.
When a microservice is overloaded and becomes a bottleneck, scaling up by increasing the number of instances is possible. In Kubernetes, you can update the replicas field in Deployment as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 3 ...
Similarly, you can imperatively use the kubectl command to make scaling changes such as:
kubectl
$ kubectl scale --replicas=4 deployment/nginx
When there are more microservice instances, Kubernetes will distribute the load, and the bottleneck will be obsolete.
Kubernetes clusters run the workloads on the worker nodes connected to the Kubernetes control plane. Let’s assume there are three nodes in the cluster with 16 GB of memory. If the nodes are maxed out, vertical scaling recommends changing the three nodes with a higher memory resource, such as 32 GB. With double the amount of total memory, it is now possible to deploy more applications.
When the Kubernetes nodes run out of resources, you can add new nodes to the cluster. If you have three nodes with 16 GB of memory and it is not enough, you can manually add any number of nodes with any configuration and increase the total resource amount. In other words, you can either add three more nodes with 16 GB memory or add a single node with 48 GB memory.
Horizontal scaling of the cluster brings flexibility regarding node specifications and the number of nodes, which also helps reduce your cloud bill.
Adding new nodes to clusters and changing the number of running pods is straightforward. However, these are all manual changes that require human interaction with the cluster. When it comes to the cloud-native modern era, it is burdensome to watch resource usage and immediately scale.
Elastic scaling is the automatic approach of adding and removing compute, memory, storage, and networking infrastructure based on resource usage. Kubernetes offers elastic scaling out of the box for both clusters and microservices.
Cluster autoscaler is the Kubernetes component that automatically adjusts the size of the Kubernetes cluster. When there are unscheduled pods due to resource limitations or affinity rules, cluster autoscaler evaluates whether adding a new node will resolve the issue. If so, it scales the cluster up and opens up space for the unscheduled pods.
Cluster autoscaler also keeps an eye out for opportunities to move some pods within PodDisruptionBudgets in order to free up some nodes for scaling down. While using cluster autoscaler, it is critical to ensure that all pods have their resource requests and limits configured since they are used for node pool size calculation.
Kubernetes offers two autoscaling mechanisms for single microservices.
Horizontal Pod Autoscaler (HPA) watches pods to compare actual usage and resource requests. When the resource usage reaches the configured target values, HPA increases the number of pods. For instance, you can configure HPA to scale up when the CPU usage exceeds 50% of the requested value:
$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
In order to use HPA efficiently, you need to make sure your pods’ resource request configuration is correct. In addition, it’s possible to use custom metrics in HPA to work with business-critical indicators.
Vertical Pod Autoscaling (VPA) watches for the actual resource usage of the pods in the cluster and tries to find over-committed resources. When there are substantial differences, VPA increases or decreases the resource requests of the pods. The critical drawback of VPA is that resource updates cause pods to restart—and possibly on another node.
As a best practice, you should use VPA together with cluster autoscaler since they both try to optimize overall cluster utilization.
To scale microservices in a dynamic environment like Kubernetes, you need to ensure that you have software architecture and deployment strategies that are compatible with the environment. When microservices are scaled, it’s critical to ensure that each instance works correctly while the whole swarm works coherently. For instance, let’s assume you use a local cache for each pod. Scaling up for more than one instance could lead to different responses to the same requests.
Instead, it’s better to use shared caching systems between pod instances to ensure consistency. You should also implement data governance models and limit collisions between pods to make the swarm of pods work coherently. This means the architecture of your application stack should be properly structured for scaling up and down.
Scaling manually or automatically relies on watching resource usage and metrics. So finding and monitoring what is vital for business continuity is critical. The first metrics to consider are CPU, memory, and storage since they‘re easy to monitor and integrate into autoscalers. The next step is to consider the Four Golden Signals:
When the application architecture is suitable for scalability and the correct metrics are utilized, the rest is simply configuring autoscalers and Kubernetes resources to have a cloud-native scalable microservice. However, as mentioned throughout this article, scaling manually or automatically is not straightforward.
Kubernetes and the distributed applications running in the cluster create a reasonably complicated stack. With the dynamic autoscalers and automated deployments, it becomes even more challenging to troubleshoot and debug without cloud-native tooling.
This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go south. To learn more about how Komodor can make it easier to empower your teams to shift left and independently troubleshoot Kubernetes-related issues, sign up for our free trial.
Share:
How useful was this post?
Click on a star to rate it!
Average rating 5 / 5. Vote count: 5
No votes so far! Be the first to rate this post.
and start using Komodor in seconds!