Komodor is a Kubernetes management platform that empowers everyone from Platform engineers to Developers to stop firefighting, simplify operations and proactively improve the health of their workloads and infrastructure.
Proactively detect & remediate issues in your clusters & workloads.
Easily operate & manage K8s clusters at scale.
Reduce costs without compromising on performance.
Empower developers with self-service K8s troubleshooting.
Simplify and accelerate K8s migration for everyone.
Fix things fast with AI-powered root cause analysis.
Explore our K8s guides, e-books and webinars.
Learn about K8s trends & best practices from our experts.
Listen to K8s adoption stories from seasoned industry veterans.
The missing UI for Helm – a simplified way of working with Helm.
Visualize Crossplane resources and speed up troubleshooting.
Validate, clean & secure your K8s YAMLs.
Navigate the community-driven K8s ecosystem map.
Kubernetes 101: A comprehensive guide
Expert tips for debugging Kubernetes
Tools and best practices
Kubernetes monitoring best practices
Understand Kubernetes & Container exit codes in simple terms
Exploring the building blocks of Kubernetes
Cost factors, challenges and solutions
Kubectl commands at your fingertips
Understanding K8s versions & getting the latest version
Rancher overview, tutorial and alternatives
Kubernetes management tools: Lens vs alternatives
Troubleshooting and fixing 5xx server errors
Solving common Git errors and issues
Who we are, and our promise for the future of K8s.
Have a question for us? Write us.
Come aboard the K8s ship – we’re hiring!
Hear’s what they’re saying about Komodor in the news.
A Kubernetes Job is a workload controller object that performs specific tasks on a cluster. It differs from most controller objects such as Deployments and ReplicaSets, which need to constantly reconcile the current state of the cluster with a desired configuration.
A Job has a much more limited function: it runs pods until they complete a specified task, and then terminates them.
A CronJob is the same as a regular Job, only it creates jobs on a schedule (with a syntax similar to the Linux cron utility). CronJobs are used to perform regularly scheduled tasks such as backups and report generation. You can define the tasks to run at specific times or repeat indefinitely (for example, daily or weekly).
This article is a part of a series on Kubernetes Troubleshooting.
Traditional cron tasks and Kubernetes CronJobs both allow you to schedule tasks at specific intervals. However, they operate in different environments and have distinct advantages and drawbacks.
Itiel Shwartz
Co-Founder & CTO
In my experience, here are tips that can help you better manage Kubernetes CronJobs:
Use monitoring tools to track CronJob executions and failures.
Define concurrency policies to control overlapping job executions.
Allocate appropriate resources to CronJobs to avoid resource contention.
Configure retry policies for failed jobs to ensure task completion.
Enable detailed logging for CronJobs to aid in troubleshooting.
Kubernetes CronJobs function by creating and managing jobs according to a defined schedule. Here’s a detailed look at how they operate:
spec.schedule
successfulJobsHistoryLimit
failedJobsHistoryLimit
Here are a few reasons CronJobs can be highly useful:
Related content: read our guide to fixing kubernetes node not ready error.
Creating a CronJob is very similar to creating a regular Job. We’ll need to define a YAML manifest file that includes the Job name, which containers to run, and commands to execute on the containers.
To create a Kuberntes CronJob:
1. Create a YAML file in a text editor.
nano [mycronjob].yaml
2. The CronJob YAML configuration should look something like this. Pay attention to the spec.schedule field, which defines when and how often the Job should run. We explain the cron schedule syntax in the following section.
apiVersion: batch/v1kind: CronJobmetadata: name: hellospec: schedule: "0 0 * * *" jobTemplate: spec: template: spec: containers: - name: hello image: busybox:1.28 imagePullPolicy: IfNotPresent command: - /bin/sh - -c - date; echo Hello World restartPolicy: OnFailure
A few important points about this code:
spec:schedule
spec:template:containers
restartPolicy
Never
OnFailure
3. Create your CronJob in the cluster using this command:
kubectl apply -f [filename].yaml
4. Run the following command to monitor task execution:
kubectl get cronjob --watch
The cron schedule syntax used by the spec.schedule field has five characters, and each character represents one time unit.
Here is what each of the examples means:
One CronJob can serve as the model for various jobs, but you may need to adjust it. Here are some considerations when defining a CronJob.
CronJobs have embedded concurrency controls (a major difference from Unix cron) that let you disable concurrent execution, although Kubernetes enables concurrency by default. With concurrency enabled, a scheduled CronJob run will start even if the last run is incomplete. Concurrency is not desirable for jobs that require sequential execution.
You can control concurrency by configuring the concurrency policy on CronJob objects. You can set one of three values:
You can apply the concurrency policy to the cluster to create CronJobs that only permit a single run at any time.
A starting deadline determines if your scheduled CronJob run can start. This concept is specific to Kubernetes, defining how long each job run is eligible to begin after the scheduled time has lapsed. It is useful for jobs with disabled concurrency when job runs cannot always start on schedule.
The starting deadline seconds field controls this value. For example, a starting deadline of 15 seconds allows a limited delay – a job scheduled for 10:00:00 can start if the previous run ends at 10:00:14, but not if it ends at 10:00:15.
Another two values are the successful jobs history limit and the failed jobs history limit. They control the time limit for retaining the history of these job types (by default, three successful and one failed job). You can change these values – higher values keep the history for longer, which is useful for debugging.
Kubernetes allows you to monitor CronJobs with mechanisms like the kubectl command. The get command provides a CronJob’s definition and job run details. The jobs within a CronJob should have the CronJob name alongside an appended starting timestamp.After identifying an individual job, you can use a kubectl command to retrieve container logs:
$ kubectl logs job/example-cron-1648239040
This error arises when CronJobs don’t fire as scheduled on Kubernetes. Manually firing the Job shows that it is functioning, yet the Pod for the cron job doesn’t appear.
A CronJob is scheduled to fire every 60 seconds on an instance of Microk8s but doesn’t happen. The user tries to fire the Job with the following command manually:
k create job --from=cronjob/demo-cron-job demo-cron-job
While the Job runs after this command, it doesn’t run as scheduled. Here is the manifest of the API object in YAML format:
apiVersion: batch/v1beta1kind: CronJobmetadata: name: demo-cron-job namespace: {{ .Values.global.namespace }} labels: app.kubernetes.io/managed-by: Helm app.kubernetes.io/release-name: {{ .Release.Name }} app.kubernetes.io/release-namespace: {{ .Release.Namespace }}spec: schedule: "* * * * *" concurrencyPolicy: Replace successfulJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: demo-cron-job image: demoImage imagePullPolicy: IfNotPresent command: - /bin/sh - -c - /usr/bin/curl -k http://restfulservices/api/demo-job restartPolicy: OnFailure
A possible resolution is to restart the entire namespace and redeploy. However, you should check the following details In such a case:
kubectl describe cronjob demo-cron-job -n tango
kubectl get jobs -n tango
This error arises when a CronJob stops scheduling the specified job. It is a common error to face when some part of the job fails consistently for a few times.
The user scheduled a CronJob that was functioning for some time before it stopped scheduling new jobs. The Job involved a step where it had to pull a container image and failed. Their manifest is shown below:
apiVersion: batch/v1beta1kind: CronJobmetadata: labels: app.kubernetes.io/instance: demo-cron-job app.kubernetes.io/managed-by: Tiller app.kubernetes.io/name: cron helm.sh/chart: cron-0.1.0 name: demo-cron-jobspec: concurrencyPolicy: Forbid failedJobsHistoryLimit: 1 jobTemplate: metadata: creationTimestamp: null spec: template: spec: containers: - args: - -c - npm run script command: - /bin/sh env: image: imagePullPolicy: Always name: cron resources: {} securityContext: runAsUser: 1000 terminationMessagePath: /dev/demo-termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Never schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 schedule: 0/30 * * * * successfulJobsHistoryLimit: 3 suspend: falsestatus: {}
Here, the spec.restartPolicy specification is set to Never. Hence, the entire Pod fails whenever a container in the Pod fails. However, the manifest doesn’t include the .spec.backoffLimit field that specifies how many times the Job will retry before being considered a failed Job. Then, it resorts to the default value, which is 6. Hence, the Job here tries to pull the container image six times before considering it a failed job.
spec.restartPolicy
.spec.backoffLimit
Here are some possible resolutions:
onFailureso
imagePullPolicy
ifNotPresent
This error arises when the CronJob involves communicating with an API endpoint. If the endpoint doesn’t respond successfully, the job shows an error status.
The user has a cron job that hits a REST API’s endpoint to pull an image of the concerned application. The manifest of the job is as follows:
apiVersion: batch/v1beta1kind: CronJobmetadata: name: demo-cronjob labels: app: {{ .Release.Name }} chart: {{ .Chart.Name }}-{{ .Chart.Version }} release: {{ .Release.Name }}spec: concurrencyPolicy: Forbid successfulJobsHistoryLimit: 2 failedJobsHistoryLimit: 2 startingDeadlineSeconds: 1800 jobTemplate: spec: template: metadata: name: demo-cronjob labels: app: demo spec: restartPolicy: OnFailure containers: - name: demo image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}" command: ["/bin/sh", "-c", "curl http://localhost:8080/demo"] readinessProbe: httpGet: path: "/demojob" port: 8081 initialDelaySeconds: 300 periodSeconds: 60 timeoutSeconds: 30 failureThreshold: 3 livenessProbe: httpGet: path: "/demojob" port: 8081 initialDelaySeconds: 300 periodSeconds: 60 timeoutSeconds: 30 failureThreshold: 3 resources: requests: cpu: 200m memory: 4Gi limits: cpu: 1 memory: 8Gi schedule: "*/40 * * * *"
The user then faces the following error:
curl: (7) Failed to connect to localhost port 8080: Connection refused
Here, the issue is that the user has provided command and arguments which override the container image’s commands and arguments. The command here overrides the default entry point of the container, and no application starts. A possible resolution is to use a bash script that first sets up and runs the REST application, and then the Job can communicate with its endpoint.
command
Kubernetes troubleshooting is complex and involves multiple components; you might experience errors that are difficult to diagnose and fix. Without the right tools and expertise in place, the troubleshooting process can become stressful, ineffective and time-consuming. Some best practices can help minimize the chances of things breaking down, but eventually something will go wrong – simply because it can – especially across hybrid cloud environments.
This is where Komodor comes in – Komodor is the Continuous Kubernetes Reliability Platform, designed to democratize K8s expertise across the organization and enable engineering teams to leverage its full value.
Komodor’s platform empowers developers to confidently monitor and troubleshoot their workloads while allowing cluster operators to enforce standardization and optimize performance. Specifically when working in a hybrid environment, Komodor reduces the complexity by providing a unified view of all your services and clusters.
By leveraging Komodor, companies of all sizes significantly improve reliability, productivity, and velocity. Or, to put it simply – Komodor helps you spend less time and resources on managing Kubernetes, and more time on innovating at scale.
If you are interested in checking out Komodor, use this link to sign up for a Free Trial.
Share:
How useful was this post?
Click on a star to rate it!
Average rating 5 / 5. Vote count: 6
No votes so far! Be the first to rate this post.
and start using Komodor in seconds!