Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
In Kubernetes, you use probes to configure health checks that help determine each pod’s state. Distributed microservices-based systems must automatically detect unhealthy applications, reroute requests to other systems, and restore broken components. Health checks help address this challenge to ensure reliability.
Kubernetes observes a pod’s lifecycle by default and routes traffic to the pod when containers move from a ‘pending’ state to ‘succeeded’. It locates application crashes and restarts the unhealthy pod to recover. This basic setup is not enough to ensure health in Kubernetes, especially when an application in the pod is configured with daemon process managers.
When Kubernetes determines that a pod is healthy and ready for requests once all containers start, the application can receive traffic before it is ready. It can occur when an application needs to initialize some state, load data before handling application logic, or make database connections.
This issue creates a gap between when the application is ready and when Kubernetes thinks it is ready. As a result, when the deployment starts to scale, unready applications might receive traffic and send back 500 errors.
Kubernetes health checks use probes that enable the kubelet, an agent running on each node, to validate the health and readiness of a container. Probes determine when a container is ready to accept traffic and when it should be restarted.
You can perform health checks via HTTP(S), TCP, command probes, and gRPC. We’ll show how this works with examples.
This is part of an extensive series of guides about kubernetes troubleshooting.
Here are the three probes Kubernetes offers:
A container that does not include any of these probes always succeeds.
You can create health check probes by issuing requests against a container. Here is how to implement Kubernetes probes:
An HTTP request is a mechanism that lets you create a liveness probe. You can expose an HTTP endpoint by implementing any lightweight HTTP server within the container. A probe performs an HTTP GET request against this endpoint at the container’s IP to check if the service is alive. If the endpoint returns a success code, kubelet considers the container alive and healthy. If not, the kubelet terminates and restarts this container.
You can configure a command probe to get kubelet to execute the cat /tmp/healthy command in a certain container. If the command succeeds, kubelet considers the container alive and healthy. If not, it shuts down the container and restarts it.
A TCP socket probe tells Kubernetes to open a TCP connection on a specified port on the container. If it succeeds, Kubernetes considers the container healthy. You should use TCP probes for gRPC and FTP services that already include the TCP protocol infrastructure.
You can use gRPC-health-probe in your container to enable the gRPC health check if you are running a Kubernetes version 1.23 or less. After Kubernetes version 1.23, gRPC health checks are supported by default natively.
In a health check, you define the endpoint, interval, timeout, and grace period:
To configure an HTTP health check in Kubernetes, you need to define a liveness or readiness probe that makes an HTTP request to a specific endpoint in your application. Here’s an example of how to set up an HTTP liveness probe:
apiVersion: batch/v1kind: CronJobmetadata: name: hellospec: schedule: "0 0 * * *" jobTemplate: spec: template: spec: containers: - name: hello image: busybox:1.28 imagePullPolicy: IfNotPresent command: - /bin/sh - -c - date; echo Hello World restartPolicy: OnFailure
In this configuration:
path
port
initialDelaySeconds
periodSeconds
The probe sends an HTTP GET request to the /healthz endpoint. If it returns a 2xx status code, the container is considered healthy; otherwise, it is restarted.
/healthz
To manually check if the pod is working, you can use a command like curl <pod-ip-address>:8080/healthz
curl <pod-ip-address>:8080/healthz
TCP health checks are useful for applications that rely on TCP connections, such as databases or gRPC services. Here’s an example of a TCP liveness probe on a MySQL database. Before running this health check, ensure MySQL is installed and running on the pod.
apiVersion: v1kind: Podmetadata: labels: app: myapp name: myapp-podspec: containers: - name: myapp-container image: myapp:latest livenessProbe: tcpSocket: port: 3306 initialDelaySeconds: 5 periodSeconds: 10
tcpSocket
Command probes execute a command inside the container to determine its health. Here’s an example of a command liveness probe:
apiVersion: v1kind: Podmetadata: labels: app: myapp name: myapp-podspec: containers: - name: myapp-container image: myapp:latest args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 10
exec
Kubernetes supports gRPC health checks natively from version 1.24 onward. For earlier versions, you can use a helper tool like grpc-health-probe.
To configure a gRPC health check, first add the grpc-health-probe binary to your container image. Then, configure the gRPC health check in your pod definition:
grpc-health-probe
apiVersion: v1kind: Podmetadata: labels: app: myapp name: myapp-podspec: containers: - name: myapp-container image: myapp:latest livenessProbe: exec: command: - /grpc-health-probe - -addr=:50051 initialDelaySeconds: 5 periodSeconds: 10
command
To understand how health checks enable faster troubleshooting, consider the following example. Let’s say there is an application deployment and a service in front of the deployment to balance the traffic. If you can’t see the application coming up, you will start troubleshooting by checking the health check. If you have set up the proper health check, the application pods will get killed and you can put an alert on the pod status, which will tell you that the application pods are failing the health check.
Sometimes, your application passes a health check and the application pods are up, but the application is still not receiving any traffic. This can happen if you have implemented a readiness check, but it is not successful. If the readiness check is failing, Kubernetes will not add your pods to the service endpoint and your service will not have any pods to send traffic to.
There are a few commands that can help you debug issues more quickly.
Use the below command to see if your containers are up and running. This command will show how many pods are up in your deployment.
kubectl get deployment deployment_name -n dep_namespace
If you see that the pods are not up, look at the deployment descriptions or events with the following command. This will show how many pods are up in the replica set.
kubectl describe deployment deployment_name -n dep_namespace
Then, you can take the replica set name and check what is happening in the ReplicaSet events. This will show if there is any issue bringing up your pods.
kubectl describe replica set replicaset_name -n dep_namespace
Lastly, you can describe your pods to see if they are failing the health checks.
kubectl describe pod pod_name -n dep_namespace
You can also try looking at the pod’s logs to identify why it failed the health checks, using the below command.
kubectl logs -f pod_name -n dep_namespace
In addition, checking the events in deployments, replica sets, pods, and pod logs will tell you a lot about any issues. If you are running StatefulSet, you can use the same commands for troubleshooting.
You can also check if the endpoint object in the service has your pod IPs or not.
kubectl describe service service_name -n dep_namespace
If you have a load-balancer service, you will be able to see your instances attached to the load balancer. If your health check is failing, the load balancer will remove the instances, and traffic won’t be forwarded to the instances.
Kubernetes events are very important when you are troubleshooting. Most of the time, you will be able to find the issue in one of the Kubernetes events. You can easily see events related to any Kubernetes object using the Kubernetes describe command.
There are several common pitfalls you may run into when running Kubernetes health checks:
Health checks are clearly important for every application. The good news is that they are easy to implement and, if done properly, enable you to troubleshoot issues faster. If you log exactly why a health check failed, you can pinpoint and solve issues easily.
While these tips can (and will) help minimize the chances of things breaking down, eventually, something else can go wrong – simply because it can.
This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
How useful was this post?
Click on a star to rate it!
Average rating 5 / 5. Vote count: 7
No votes so far! Be the first to rate this post.
and start using Komodor in seconds!