Troubleshooting in Kubernetes: The Shift-Left Approach

Kubernetes has become the de-facto container management solution of the last decade—and we have no doubt it will stay that way in the upcoming years. It provides a solid abstraction between the infrastructure layer and applications, so that developers can quickly develop, deploy, and operate their applications.

Kubernetes is designed as a set of APIs that work together. If you deploy simple applications and make them run, Kubernetes will do it for you. It will scale your applications, restart if they are stuck, and direct user requests to the healthy ones.

However, this high level of automation can also be like a tiny spark that causes a large fire in your Kubernetes cluster. For instance, a misconfigured field in your Kubernetes configmap could spread to hundreds of instances and take down your application. Therefore, deploying and operating Kubernetes applications requires more awareness and care.

In addition, the modern cloud-native applications being developed are reasonably complex, so you’ll need to troubleshoot and debug from time to time. Unfortunately, Kubernetes’ default setup is not always the best platform for this. In this blog post, we will discuss a new paradigm for making Kubernetes easier to troubleshoot: the shift-left approach.

The Kubernetes Lifecycle

Let’s start by discussing the lifecycle and critical milestones of a Kubernetes application. General lifecycle management in software applications is a comprehensive concept. It includes requirements management, architecture, programming, testing, maintenance, change management, continuous integration, release management, and, last but not least, project management. This can feel like a complete list of all the teams in a large enterprise! Luckily, the Kubernetes application lifecycle is highly simplified and mostly organized according to three phases: Day 0, Day 1, and Day 2.  

Day 0: Design

In this stage, you need to make decisions related to your Kubernetes platform and application requirements. Here are some important things to consider:

  • Kubernetes platforms, such as public cloud, private cloud, or on-premises options
  • Infrastructure requirements, such as networking, storage, or monitoring
  • Cloud service integrations and security features
  • Integration to CI/CD or GitOps

Day 1: Deploy

Next, you need to consider how to deploy your application to your Kubernetes clusters. The critical points to consider in this phase include:

  • Application-level deployment strategies, such as blue/green or A/B 
  • Multi-cloud and multi-cluster deployment strategies with resource and cost optimization
  • Staged application deployments with their environment requirements: test, staging, and production
  • Security, trackability, and visibility of deployments

Day 2: Operate

Most people joining the Kubernetes movement focus on Day 0 and Day 1, and forget about the most difficult part of Kubernetes applications: operation. In the Day 2 phase, everything has already been designed and deployed to the cluster. Your job is to maintain the infrastructure and keep the systems alive and working. Here are the vital problems you usually need to deal with during this phase: 

  • SLA level and how to achieve it
  • Troubleshooting guides and playbooks 
  • Overall visibility of applications and their status; namely, monitoring and alerting
  • Tracking and acting on resource usage and utilization 
  • Access control to the Kubernetes clusters and applications
  • Backup and restore of Kubernetes clusters, configuration, and application data, such as database disks

Lifecycle management in Kubernetes makes you focus on the actual problems and helps you move to the production stage with a more robust application, cluster, and – let’s not forget – mindset. In the following section, we will discuss the shift-left paradigm and show how it can be applied to the Kubernetes lifecycle with some relevant best practices.

What is Shift-Left in DevOps?

The shift-left approach focuses on working on problems that may occur in the later stages of the software development lifecycle while you are still in the earlier stages. Put simply, it encourages you to shift the mindset and priorities required in each step to the previous one. For instance, you need to design your application’s architecture considering the deployment characteristics of your cloud provider. Similarly, you need to deploy your applications to be easy to operate and troubleshoot. In short, the shift-left approach makes the software development lifecycle a more coherent set of processes. 

To apply the shift-left approach to Kubernetes, you’ll need to test your applications on Day 0, considering that they will be deployed to clusters on Day 1. Similarly, you need to deploy your applications on Day 1, without forgetting that you will be operating and troubleshooting them on Day 2. 

It is estimated that today, the ratio of DevOps engineers to developers is between 1:10 to 1:12. Therefore, it is inevitable for developers to deep-dive into operational tasks and troubleshoot applications running in the clusters. Whether the underlying problems are related to the applications or infrastructure, the shift-left paradigm becomes a reality when you start using Kubernetes in production.

Like all paradigm changes in software development, the shift-left approach does not have a silver-bullet tool to apply its rules. Therefore, people and best practices are essential.

Kubernetes Best Practices: “Left-Shifted”

Now that you understand the stages of the software lifecycle for Kubernetes, let’s discuss how to apply the shift-left approach through a list of best practices. 

Day 0: Design

In this stage, you should consider how to deploy and operate applications running in Kubernetes clusters: 

  • Consider cloud-specific integration points
  • Minimize the number of dependency libraries
  • Use testable container images
  • Design troubleshooting APIs as first-class citizens

Day 1: Deploy

In the deployment stage, create application environments that are easy to operate and troubleshoot:

  • Deploy with staged environments: testing, staging, and production
  • Enable logging and tracing
  • Integrate troubleshooting tools into container images
  • Use metadata, like labels and annotations

Day 2: Operate

Operation is the last stage of the lifecycle, but it is not the end-of-life for software development. In this stage, you should create an environment that is easy to operate and troubleshoot, so that future applications can be developed and deployed:

  • Use namespaces to isolate applications
  • Onboard the whole team and ensure everyone has access to Kubernetes clusters
  • Create alerts, notifications, and support levels 

Conclusion

We hope these best practices help you improve your Kubernetes lifecycle management. While these tips can (and will) help minimize the chances of things breaking down, eventually, something else can go wrong – simply because it can.

This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go south. To learn more about how Komodor can make it easier to empower your teams to shift left and independently troubleshoot Kubernetes-related issues, sign up for our free trial.

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 7

No votes so far! Be the first to rate this post.