Jobs & CronJobs
Jobs run to completion and exit. Use a Job for one-shot batch processing (data transforms, migrations, report generation) and a CronJob for scheduled recurring work (nightly reports, periodic cleanup). Both shapes fit naturally into the GitOps workflow — commit the manifest, let ArgoCD sync it, and the cluster handles scheduling.
Preconditions
Section titled “Preconditions”Before using this recipe, confirm:
- Your tenant is provisioned and you have completed Your first deployment end-to-end.
- Your workload repo is set up per Your repo, your workloads — ArgoCD is pointed at your repo and syncing.
- You know your tenant name (the
<your-tenant>prefix used in namespace names). - Your PVCs are provisioned in the target namespace. PVC creation is covered in Persistent volumes in practice.
Use case
Section titled “Use case”Use a Job when your workload:
- Runs to completion and exits (does not serve traffic).
- Needs to process data from a PVC, generate output to a PVC, or perform a one-time operation.
- Has a deadline and you want it scheduled promptly (uses
uber-user-preempt-medium— willing to preempt opportunistic peers).
Use a CronJob when your workload:
- Runs on a recurring schedule (nightly, hourly, weekly).
- Is retry-tolerant — if preempted tonight, it re-runs tomorrow.
- Writes output to an in-cluster destination (another PVC, a Service endpoint in the same tenant).
If your workload runs continuously and serves traffic, see Long-running service. If you need a short-lived interactive environment, see Dev pod.
All manifests go in your workload repo under a directory that your ArgoCD Application’s spec.source.path points at (e.g. manifests/batch/). Replace every <your-tenant> placeholder with your real tenant name.
Job — one-shot data processing
Section titled “Job — one-shot data processing”The minimal variant runs a single Python container that reads from an input PVC and writes results to an output PVC. Uses uber-user-preempt-medium priority — deadline-bound batch willing to preempt opportunistic peers.
apiVersion: batch/v1kind: Jobmetadata: name: process-2026-q2 namespace: <your-tenant>-batchspec: backoffLimit: 2 template: metadata: labels: job: process-2026-q2 spec: restartPolicy: OnFailure priorityClassName: uber-user-preempt-medium securityContext: runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault containers: - name: worker image: python:3.12-slim command: ['python', '/app/process.py'] securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: [ALL] resources: requests: cpu: 500m memory: 1Gi limits: cpu: 2 memory: 4Gi volumeMounts: - name: input mountPath: /data/input readOnly: true - name: output mountPath: /data/output - name: tmp mountPath: /tmp - name: app mountPath: /app readOnly: true volumes: - name: input persistentVolumeClaim: claimName: input-data - name: output persistentVolumeClaim: claimName: output-data - name: tmp emptyDir: {} - name: app configMap: name: process-scriptThe recommended variant adds an activeDeadlineSeconds timeout, explicit ttlSecondsAfterFinished cleanup, and tighter backoffLimit. Use this for production batch pipelines where you want failed Jobs to surface quickly and completed Jobs to clean up automatically.
apiVersion: batch/v1kind: Jobmetadata: name: process-2026-q2 namespace: <your-tenant>-batchspec: backoffLimit: 1 activeDeadlineSeconds: 3600 ttlSecondsAfterFinished: 86400 template: metadata: labels: job: process-2026-q2 spec: restartPolicy: Never priorityClassName: uber-user-preempt-medium securityContext: runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault containers: - name: worker image: python:3.12-slim command: ['python', '/app/process.py'] securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: [ALL] resources: requests: cpu: 500m memory: 1Gi limits: cpu: 2 memory: 4Gi volumeMounts: - name: input mountPath: /data/input readOnly: true - name: output mountPath: /data/output - name: tmp mountPath: /tmp - name: app mountPath: /app readOnly: true volumes: - name: input persistentVolumeClaim: claimName: input-data - name: output persistentVolumeClaim: claimName: output-data - name: tmp emptyDir: {} - name: app configMap: name: process-scriptCronJob — nightly report
Section titled “CronJob — nightly report”This CronJob runs a Python script at 03:00 UTC daily that reads from a data PVC and writes a report to an output PVC in the same tenant namespace. The cluster runs on UTC — all schedule values are interpreted as UTC, not your local timezone.
Uses uber-user-significant priority — opportunistic scheduling. If the cluster is under pressure and a higher-priority pod preempts this CronJob’s pod, it simply re-runs at the next scheduled time.
apiVersion: batch/v1kind: CronJobmetadata: name: nightly-report namespace: <your-tenant>-batchspec: schedule: "0 3 * * *" timeZone: "UTC" concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: backoffLimit: 1 activeDeadlineSeconds: 1800 template: spec: restartPolicy: OnFailure priorityClassName: uber-user-significant securityContext: runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault containers: - name: reporter image: python:3.12-slim command: ['python', '/app/report.py'] securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: [ALL] resources: requests: cpu: 250m memory: 512Mi limits: cpu: 1 memory: 2Gi volumeMounts: - name: data mountPath: /data/input readOnly: true - name: reports mountPath: /data/reports - name: tmp mountPath: /tmp - name: app mountPath: /app readOnly: true volumes: - name: data persistentVolumeClaim: claimName: tenant-data - name: reports persistentVolumeClaim: claimName: report-output - name: tmp emptyDir: {} - name: app configMap: name: report-scriptDry-run
Section titled “Dry-run”Before committing, validate your manifests locally:
kubectl apply --dry-run=client -f manifests/batch/job.yamlkubectl apply --dry-run=client -f manifests/batch/cronjob.yamlBoth should render without errors. This catches YAML syntax issues and missing required fields before ArgoCD tries to apply them.
Verify
Section titled “Verify”After pushing to your repo and letting ArgoCD sync:
For the Job:
kubectl get jobs -n <your-tenant>-batchkubectl get pods -n <your-tenant>-batch -l job=process-2026-q2kubectl logs -n <your-tenant>-batch -l job=process-2026-q2Expected:
- The Job shows
1/1under COMPLETIONS when finished. - The Pod is
Completed(notRunningorError). - Logs show your processing output.
For the CronJob:
kubectl get cronjobs -n <your-tenant>-batchkubectl get jobs -n <your-tenant>-batchkubectl logs -n <your-tenant>-batch -l job-name=nightly-report-<timestamp>Expected:
- The CronJob shows
SCHEDULEas0 3 * * *andACTIVEas 0 (between runs) or 1 (during a run). - Child Jobs appear after the first scheduled trigger, showing
1/1COMPLETIONS on success. - Logs show your report output.
If something goes wrong
Section titled “If something goes wrong”- Pod rejected by Kyverno with PSS admission error. You modified the
securityContextblock. Revert to the verbatim YAML above. See Known limitations for the full PSS rule. - Namespace rejected with
forceTenantPrefix. The namespace name does not start with your tenant name. All namespaces must be<your-tenant>-<suffix>. See Known limitations. - Job stuck
Pending— ResourcePool exhaustion. Your tenant’s resource quota is shared across all your namespaces. Other workloads in<your-tenant>-prodor<your-tenant>-devmay be consuming the quota, leaving nothing for<your-tenant>-batch. Check allocation withkubectl describe resourcepool(it is cluster-scoped —-nis ignored); for one namespace’s usage usekubectl describe resourcequota -n <your-tenant>-batch. See Resource pools and quotas for details. - Job stuck
Pending— no preemptible pods found.uber-user-preempt-mediumcan only preemptuber-user-significantpods. If nosignificantpods are running cluster-wide, there is nothing to preempt and the Job waits for capacity to free up naturally. See Priority classes for the preemption matrix. - CronJob child Job fails with
connection timed outto external endpoint. Egress is default-deny on Kestrel — reaching any external endpoint requires a tenant-scoped NetworkPolicy with an explicit egress rule for that destination. Confirm such a rule exists and covers the endpoint; without one, outbound traffic to anything beyond DNS and intra-tenant services is blocked. See NetworkPolicy in practice. - CronJob never triggers. Check that the
scheduleis valid cron syntax and remember it is UTC.kubectl describe cronjob -n <your-tenant>-batch nightly-reportshowsLast Schedule Timeand any scheduling errors. - ArgoCD shows
OutOfSyncbut no errors. ArgoCD polls on a 3-minute interval. Click Sync in the ArgoCD UI for an immediate reconcile. - Need to make a temporary change outside GitOps? Use the escape hatch workflow — but remember that ArgoCD reverts any mutation on the next sync.