KEDA (K8s Event-Based Scaling): Features, Architecture, and Tutorial

What Is Kubernetes Event-Driven Autoscaling (KEDA)? 

Kubernetes Event-driven Autoscaling (KEDA) is a component within the Kubernetes ecosystem that scales applications based on external events. Unlike traditional scaling mechanisms that rely on resource usage metrics like CPU and memory, KEDA enables applications to scale in response to a variety of event sources such as messaging queues, database changes, or custom-defined events. 

This makes KEDA particularly valuable for event-driven architectures where workloads can be highly variable and unpredictable. KEDA integrates with Kubernetes’ Horizontal Pod Autoscaler (HPA), extending its capabilities to support event-driven scaling. 

KEDA is open sourced under the Apache 2.0 license. Its primary corporate sponsors are Microsoft Azure, Snyk, and VexxHost. It has over 8K GitHub stars and over 370 contributors, was accepted to CNCF in 2020, and achieved Graduated maturity level in August, 2023.

You can get KEDA from the official project website.

This is part of a series of articles about Kubernetes tools

Key Features of KEDA 

KEDA’s capabilities rely on the following features:

  • Support for a range of event sources: These include popular message brokers like Apache Kafka, RabbitMQ, and Azure Service Bus, databases such as MongoDB and Redis, and other cloud services like AWS SQS and Google Pub/Sub. 
  • Built-in scalers: These simplify the process of defining how applications should scale in response to different event sources. They can monitor metrics specific to each event source, such as the length of a message queue or the number of unprocessed events, and trigger scaling actions accordingly. 
  • Seamless integration with Kubernetes Horizontal Pod Autoscaler (HPA): This allows users to leverage the familiar HPA framework while extending its functionality to include a broader range of scaling triggers. Using KEDA alongside HPA enables more dynamic and responsive scaling policies that account for both resource utilization and external events.
  • Flexible deployment modes: KEDA can be deployed as a standalone operator or integrated into existing Kubernetes clusters without significant changes to the current setup. 

How KEDA Works 

KEDA extends the standard Kubernetes autoscaling mechanisms to support event-driven scenarios. This involves several key components and processes that enable applications to scale efficiently based on real-time events. The main components of KEDA include the agent, metrics, and admission webhooks.

Source: KEDA

Agent

The KEDA agent runs within the Kubernetes cluster. It is responsible for monitoring the configured event sources continuously and determining when scaling actions are necessary. Here’s how it works:

  1. Event source monitoring: The agent connects to various event sources, such as message queues, databases, and other services, to gather relevant metrics. Each event source has a corresponding scaler that defines how the metrics should be interpreted.
  2. Metric collection: The agent collects metrics from these event sources at regular intervals. For example, it might monitor the number of pending messages in a queue or the rate of incoming requests.
  3. Evaluation: The collected metrics are evaluated against the scaling policies defined by the user. These policies specify the conditions under which the application should scale up or down.
  4. Triggering scaling actions: When the evaluation determines that scaling is necessary, the agent communicates with the Kubernetes control plane to adjust the number of replicas of the targeted deployment. This ensures that the application has enough resources to handle the current workload.

Metrics

Metrics provide the data needed to make informed scaling decisions. KEDA uses custom metrics that are specific to the event sources being monitored. Here’s a closer look at how metrics work in KEDA:

  1. Custom metrics server: KEDA includes a custom metrics server that exposes the metrics collected from event sources to Kubernetes’ Horizontal Pod Autoscaler (HPA). This allows the HPA to use these metrics to make scaling decisions.
  2. Metric providers: Each event source has a corresponding metric provider that translates the raw data from the event source into metrics that KEDA can use. For example, a metric provider for an Azure Service Bus queue would report the number of active messages.
  3. Metric types: Metrics can vary depending on the event source. Common types of metrics include queue length, request rate, latency, and database operation counts. These metrics provide a detailed picture of the workload’s demands.
  4. Thresholds and policies: Users define thresholds and policies that specify when scaling should occur. For example, a policy might dictate that if the number of messages in a queue exceeds 1000, the application should scale up by adding more replicas.

Admission Webhooks

Admission webhooks ensure that scaling configurations are applied correctly and securely. They aid in maintaining the integrity of the scaling process. Here’s how admission webhooks function within KEDA:

  1. Interception and validation: Admission webhooks intercept requests to the Kubernetes API server that involve scaling configurations. They validate these requests to ensure they meet predefined criteria and policies.
  2. Configuration enforcement: The webhooks enforce configuration rules, ensuring that only valid scaling policies are applied. This prevents misconfigurations that could lead to instability or inefficiency in the cluster.
  3. Dynamic updates: They allow for dynamic updates to scaling configurations without requiring application restarts or manual interventions. This means scaling policies can be adjusted on-the-fly based on changing workload patterns.

Security and compliance: By validating and enforcing scaling configurations, admission webhooks add an extra layer of security. They help ensure that only authorized changes are made, protecting the cluster from potential misconfigurations or malicious actions.

expert-icon-header

Tips from the expert

Itiel Shwartz

Co-Founder & CTO

Itiel is the CTO and co-founder of Komodor. He’s a big believer in dev empowerment and moving fast, has worked at eBay, Forter and Rookout (as the founding engineer). Itiel is a backend and infra developer turned “DevOps”, an avid public speaker that loves talking about things such as cloud infrastructure, Kubernetes, Python, observability, and R&D culture.

Based on my experience, here are a few ways to make more effective use of KEDA in your Kubernetes cluster:

Use multiple event sources:

Configure your application to respond to various event sources simultaneously. This allows you to scale your application based on different types of events, improving responsiveness and resource utilization across diverse workloads.

Implement fine-grained scaling policies:

Create multiple ScaledObject or ScaledJob resources for different parts of your application. This allows for more granular control over scaling behaviors, ensuring that each component scales optimally based on its specific workload.

Utilize KEDA’s scaling strategies:

Explore KEDA’s scaling strategies, such as “external”, “custom”, and “job” strategies, to better tailor scaling behaviors to your application’s needs. These strategies provide flexibility in how scaling is triggered and managed.

Optimize polling intervals:

Adjust the polling intervals for your scalers to balance responsiveness and resource consumption. Shorter intervals can improve responsiveness but may increase load on your event sources and the KEDA agent.

Automate KEDA deployments with GitOps:

Use GitOps practices to manage KEDA configurations and deployments. This ensures that your scaling policies and configurations are version-controlled, auditable, and easily reproducible across different environments.

Tutorial: Getting Started with KEDA 

This tutorial is adapted from the KEDA documentation.

Deploying KEDA with Helm

Helm is a package manager for Kubernetes that simplifies the deployment process. To ensure Helm is installed on your system, please run: helm version.

  1. Add the KEDA Helm repository to your Helm configuration:

helm repo add kedacore https://kedacore.github.io/charts

  1. Update the Helm repository to ensure you have the latest charts:
helm repo update

  1. Use Helm to install the KEDA Helm chart. This will create the necessary namespaces and deploy KEDA components in your Kubernetes cluster:
helm install keda kedacore/keda --namespace keda --create-namespace

These steps will set up KEDA in your cluster, ready to handle event-driven scaling.

Scaling Jobs with KEDA

In addition to scaling deployments, KEDA can scale Kubernetes jobs. This is useful for handling long-running executions where each job processes a single event to completion. Here’s how you can set this up:

  1. Define a ScaledJob resource that specifies how KEDA should manage your jobs. Here’s an example configuration:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: kafka-consumer
namespace: default
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 600
backoffLimit: 5
template:
spec:
containers:
- name: demo-kafka-client
image: demo-kafka-client:1
imagePullPolicy: Always
command: ["consume", "kafka://user:PASSWORD@kafka.default.svc.cluster.local:9092"]
envFrom:
- secretRef:
name: kafka-consumer-secrets
restartPolicy: Never
pollingInterval: 10
maxReplicaCount: 50
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 2
scalingStrategy:
strategy: "custom"
customScalingQueueLengthDeduction: 1
customScalingRunningJobPercentage: "0.5"
triggers:
- type: kafka
metadata:
topic: welcome
bootstrapServers: KafkaHost:9092
consumerGroup: kafka-consumer-group
lagThreshold: '10'
  1. Save the configuration to a file (e.g., scaledjob.yaml) and apply it using kubectl:
kubectl apply -f scaledjob.yaml

In this example, KEDA creates a job for each message in the Kafka topic named welcome. The job processes the message to completion and terminates. KEDA will scale the number of jobs based on the topic’s lag, ensuring efficient processing of events.

Horizontal Pod Autoscaler Troubleshooting with Komodor

Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.

Komodor can help with its ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:

  • See service-to-node associations
  • Correlate service and node health issues
  • Gain visibility over node capacity allocations, restrictions, and limitations
  • Identify “noisy neighbors” that use up cluster resources
  • Keep track of changes in managed clusters
  • Get fast access to historical node-level event data

Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues. As the leading Continuous Kubernetes Reliability Platform, Komodor is designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.

Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.

By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.If you are interested in checking out Komodor, use this link to sign up for a Free Trial.