Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
Node disk pressure is a condition in Kubernetes that indicates a node is running low on disk space. This situation can affect the node’s ability to host pods effectively, as Kubernetes relies on sufficient disk space for operations such as pulling images, running containers, and storing logs.
When disk usage crosses a certain threshold, Kubernetes marks the node with a DiskPressure condition, signaling that it’s in a state that could compromise its performance or functionality.
DiskPressure
Once a node is under disk pressure, the Kubernetes scheduler stops scheduling new pods to that node to prevent exacerbating the condition. Existing pods may continue running, but the system may evict pods that consume a lot of disk space to alleviate the pressure.
To quickly check if your Kubernetes nodes are experiencing disk pressure: Run the command kubectl describe node <node-name> and check if any of your nodes appear with condition type DiskPressure and status True.
kubectl describe node <node-name>
True
In the output, take a look at the conditions section. It will look something like this—you can see that node2 is experiencing disk pressure:
NAME STATUS ROLES AGE VERSION CONDITIONSnode1 Ready master 68d v1.20.2 DiskPressure=False,MemoryPressure=False,PIDPressure=False,Ready=Truenode2 Ready <none> 68d v1.20.2 DiskPressure=True,MemoryPressure=False,PIDPressure=False,Ready=Truenode3 Ready <none> 68d v1.20.2 DiskPressure=False,MemoryPressure=False,PIDPressure=False,Ready=True
This is part of a series of articles about Kubernetes versions.
A DiskPressure condition can result in a few serious issues:
Kubernetes may evict pods from a node experiencing disk pressure to reclaim resources and mitigate the risk of system failures. This eviction process is automated and prioritizes the eviction of pods based on their resource requests and limits, as well as the QoS class. Consequently, critical pods may experience unexpected downtime or disruptions, affecting application availability.
The Kubernetes scheduler prevents new pods from being scheduled onto nodes marked with disk pressure to preserve the node’s stability and performance. This constraint can lead to scheduling delays or failures if multiple nodes in a cluster are under disk pressure, potentially impacting application deployment and scaling activities.
As nodes struggle with limited disk space, they may become less responsive or experience increased latencies, affecting the user experience of applications hosted on the cluster. Persistent disk pressure across multiple nodes can lead to a cascading effect, where the performance of the entire cluster is compromised.
Cluster stability is also at risk during disk pressure situations. Essential system components like etcd, the Kubernetes API server, and kubelet might be affected by disk space shortages, leading to cluster-wide issues such as failures in service discovery, networking problems, and delays in executing control commands.
Node disk pressure impacts Kubernetes operations by limiting the cluster’s ability to scale and recover from failures. For instance, in a scenario where a cluster needs to scale up rapidly due to increased demand, disk pressure can prevent new pods from being scheduled, leading to service degradation or outages.
Automated recovery processes, such as pod rescheduling after a node failure, may be hindered if alternative nodes are also experiencing disk pressure.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage and resolve Kubernetes disk pressure:
Use tools like Logrotate to manage and rotate logs, preventing excessive disk usage.
Store application logs and data on network-attached storage (NAS) or other external solutions instead of local node storage.
Set resource quotas at the namespace level to limit the number of pods and their resource consumption.
Utilize PersistentVolumes and PersistentVolumeClaims for consistent storage management across nodes.
Set up Prometheus to monitor disk usage and alert on high usage to proactively manage disk space.
Here are some of the main conditions that may result in node disk pressure.
Storing application logs and data on a node’s local disk rather than a network-attached storage (NAS), shared file system, or other external storage solution, can lead to disk pressure. As applications run, they generate logs and data that can quickly accumulate, consuming significant disk space. Without proper log rotation or data management practices, the disk space on the node can be exhausted, triggering disk pressure conditions.
Running too many pods on a single node, especially if they generate a high volume of data, can cause disk pressure. Each pod can produce logs, temporary files, and persistent data, contributing to the overall disk usage on the node. As the number of pods increases, the cumulative disk space required can exceed the available capacity, leading to disk pressure.
Administrators should implement pod resource limits and use node affinity and anti-affinity rules to distribute workloads evenly across the cluster’s nodes.
Without properly configured requests and limits for storage resources, pods can consume more disk space than anticipated. This oversight can lead to a rapid exhaustion of disk resources.
Establishing clear resource limits and monitoring pod storage consumption are essential practices to prevent disk pressure. Kubernetes provides mechanisms like ResourceQuotas and LimitRanges to help administrators control resource usage across namespaces and pods, reducing the risk of disk pressure due to misconfiguration.
Storage requests that are set too low may not reflect the actual disk space requirements of an application, leading to under-provisioning. As the application operates, it may consume more disk space than reserved, contributing to disk pressure on the node.
Accurate configuration of storage requests is crucial for preventing disk pressure. Kubernetes offers PersistentVolumes and PersistentVolumeClaims as part of its storage orchestration, allowing for precise allocation and management of storage resources.
The troubleshooting process for disk pressure involves the following steps.
Administrators can use the kubectl get nodes command to check the status of nodes in the cluster. Nodes experiencing disk pressure will have a condition type DiskPressure with a status True.
kubectl get nodes
Additionally, examining the output of kubectl describe node provides detailed insights into node conditions, events, and allocated resources, helping identify potential causes of disk pressure.
kubectl describe node
The kubectl describe pod command offers information on resource consumption, including storage. For a more detailed analysis, administrators can use logging and monitoring tools to track disk usage over time. Tools like Grafana can visualize disk usage metrics collected by Prometheus, identifying pods with high disk consumption.
kubectl describe pod
Running commands like du -sh inside a pod’s containers can also provide immediate insights into disk usage within the container.
du -sh
Resolving node disk pressure involves both immediate actions and long-term strategies.
Short term solutions
Immediately, administrators can delete unused images and containers using commands like docker system prune or Kubernetes’s own garbage collection mechanisms. Evicting non-critical pods manually or adjusting pod resource limits can free up disk space temporarily.
docker system prune
Long term solutions
For long-term prevention, implementing storage management practices is crucial. This includes configuring appropriate resource limits, using external storage solutions for logs and data, and enforcing pod anti-affinity rules to prevent overloading nodes.
Tools and policies for automated log rotation and efficient image management further help maintain optimal disk usage across the cluster.
You can use this command to set pod resource limits:
kubectl set resources deployment my-deployment --limits=memory=200Mi,cpu=1
Here is a pod manifest that defines anti-affinity, ensuring pods are spread out across different nodes, reducing the likelihood of any single node facing disk pressure due to pod density.
apiVersion: apps/v1kind: Deploymentmetadata: name: my-deploymentspec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: nginx-container image: nginx affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - my-app topologyKey: "kubernetes.io/hostname"
Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.
Komodor can help with its ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:
Beyond node error remediations, can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs.
In general, Komodor is the go-to solution for managing Kubernetes at scale. Komodor’s platform continuously monitors, analyzes, and visualizes data across your Kubernetes infrastructure, providing clear, actionable insights that make it significantly easier to maintain reliability, troubleshoot in real-time, and optimize costs.
Designed for complex, multi-cluster, and hybrid environments, Komodor bridges the Kubernetes knowledge gap, empowering both infrastructure and application teams to move beyond firefighting. By facilitating collaboration, we help improve operational efficiency, reduce Mean Time to Recovery (MTTR), and accelerate development velocity.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
How useful was this post?
Click on a star to rate it!
Average rating 5 / 5. Vote count: 6
No votes so far! Be the first to rate this post.
and start using Komodor in seconds!