Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
The life of a developer these days is more complicated than ever, as they are increasingly required to expand their knowledge across the stack, understand abstract concepts, and own their code end-to-end.
A major (and very frustrating) part of a developer’s day is dedicated to fixing what they’ve built – scouring logs and code lines in search of a bug. This search becomes even harder in a distributed Kubernetes environment, where the number of daily changes can be in the hundreds.
Kubernetes makes it really easy to deploy microservices, but when something inevitably breaks the developer tasked with fixing the issue is left stranded with zero context, not even knowing where to begin.
Fixing issues can be divided into two approaches – troubleshooting and debugging. Although they are similar concepts and are often used interchangeably by mistake, they are not the same, and each requires its own methodology and toolkit.
Over the next paragraphs I will try to dispel the confusion between troubleshooting and debugging, and share some best practices for incorporating these methods into your workflow.
Troubleshooting is the strategic process of finding the root-cause of issues in a system, at a macro level. This involves understanding many components and how they interact with each other, finding out what cascading failures are leading to the issue, analyzing the symptoms exhibited on the monitoring tools, and feedback from the end-users.
Troubleshooting usually includes debugging as part of its process, stretches over more than one session and impacts multiple stakeholders.
The troubleshooting process is likely to uncover many bugs that can be surgically isolated and fixed (more on that in a bit). The real goal, however, is to identify deep-rooted problems in the system; inspecting the infrastructure, pipelines, permissions, 3rd party apps and services, architecture, and even human processes and culture.
Debugging is the more tactical approach aimed at fixing local issues or exceptions, more commonly referred to as ‘bugs’.
A bug can be anything that causes the program to behave in a different manner than expected. It can be a syntax issue or a problem with the logic employed in the code. Even a single typo can be considered a bug.
Unlike troubleshooting, debugging can be accomplished within a single session, where a developer identifies and isolates the issue and works out a solution. This can uncover deeper issues, and – as a result – improve the overall system’s resilience. However, this is not the goal of the process.
As mentioned above, debugging is a subset of troubleshooting. While debugging focuses on small, local instances that can be identified and fixed in one session, troubleshooting is a holistic process that takes into account all of the components in a system, even the team’s processes, and the way they affect each other.
It is akin to a doctor prescribing you a pill to deal with a recurring headache, versus a doctor inquiring about your diet, mental state, lifestyle and conducting a full-body scan in order to understand all the various elements contributing to your current status.
This doesn’t mean that they are two separate and unrelated processes. The opposite is true – debugging lives within troubleshooting, and it’s the natural continuation of it. Troubleshooting uncovers the deeper root cause of an issue and debugging steps in to fix the thing that broke.
So now we know the right order in which to approach the problem, but what are some of the best practices? Reduction is a core tenet of troubleshooting, and this is how they’ll be presented below. Consider the following tips as steps in a process, throwing a big net into the water and gradually narrowing the bounds, until you’ve caught your ‘bug’.
When an issue arises – either you have an alert or an end-user is experiencing difficulties – start with a bottom-up approach in Kubernetes by listing all pods in the cluster. Check to see if something is reported as an error, not ready, or crashing – this gives you the thread to pull on and move forward.
Describe the pod (using Kubectl) and get more info on the specifications, the configurations that were set up, and the events that happened (in most cases it stops there).
I.E: Failed to pull image "localhost:53329/nginx:latest" – In this case somebody made a typo and included a docker image that is unreachable from the cloud. Describing the pod and carefully examining each line would reveal the root-cause, and make for a quick debugging session.
Failed to pull image "localhost:53329/nginx:latest"
It’s time to check your config maps, ingresses, secrets, volumes, nodes – or to drill down even more by reading your app logs. It could be a Kubernetes issue or an application issue to get down to the root cause.
Configurations or Secrets used in your application might not be aligned with what the app actually needs. It’s always a good idea to check if you’re using secrets from, and in, the proper environment.
Application logs are often the last piece of the puzzle when debugging in Kubernetes (usually it will require a more intimate knowledge of the application, which the troubleshooter might not always have) but they are also the most efficient; if a pod is in a CrashLoopBackOff, the application logs will usually include a stacktrace of an exception that caused the container to crash.
CrashLoopBackOff
You’ve determined the root-cause of the issue you have – now comes debugging and fixing. By reproducing the issue, you’ll have control over how to fix it.
Note: reproducing is certainly the best practice, but not always possible. In that case, you need to resort to a trial and error (a.k.a. “Blackbox”) approach.
Once you reproduce the issue, start making changes and make sure you fix the problem. You found out the root-cause, or a hint of what the root-cause may be, and should make changes (or ‘fixes’) one by one (this is important – if you do multiple changes at once, it will be harder to understand what actually fixed the issue).
To recap, troubleshooting is the broad process of auditing a system at a macro level, and understanding its intricacies, the very way in which the cogs of the machine interact. Debugging, on the other hand, is a process of identifying and fixing exceptions locally in isolation.
Traditionally developers spent more time debugging than troubleshooting. With the advent of the ‘shift-left’ movement, troubleshooting responsibilities are increasingly entering the purview of developers on all levels, and are no longer solely in the hands of the sysadmins and architects of the world.
Seeing as this is a major pain-point for novice developers as well as for the more senior DevOps that end up doing the troubleshooting, Komodor aims to bridge the knowledge gap by simplifying Kubernetes and providing developers with all the context needed to troubleshoot issues efficiently and independently.
Share:
How useful was this post?
Click on a star to rate it!
Average rating 5 / 5. Vote count: 8
No votes so far! Be the first to rate this post.
and start using Komodor in seconds!