Jobs & CronJobs

Jobs run to completion and exit. Use a Job for one-shot batch processing (data transforms, migrations, report generation) and a CronJob for scheduled recurring work (nightly reports, periodic cleanup). Both shapes fit naturally into the GitOps workflow — commit the manifest, let ArgoCD sync it, and the cluster handles scheduling.

Preconditions

Before using this recipe, confirm:

Your tenant is provisioned and you have completed Your first deployment end-to-end.
Your workload repo is set up per Your repo, your workloads — ArgoCD is pointed at your repo and syncing.
You know your tenant name (the <your-tenant> prefix used in namespace names).
Your PVCs are provisioned in the target namespace. PVC creation is covered in Persistent volumes in practice.

Use case

Use a Job when your workload:

Runs to completion and exits (does not serve traffic).
Needs to process data from a PVC, generate output to a PVC, or perform a one-time operation.
Has a deadline and you want it scheduled promptly (uses uber-user-preempt-medium — willing to preempt opportunistic peers).

Use a CronJob when your workload:

Runs on a recurring schedule (nightly, hourly, weekly).
Is retry-tolerant — if preempted tonight, it re-runs tomorrow.
Writes output to an in-cluster destination (another PVC, a Service endpoint in the same tenant).

If your workload runs continuously and serves traffic, see Long-running service. If you need a short-lived interactive environment, see Dev pod.

YAML

All manifests go in your workload repo under a directory that your ArgoCD Application’s spec.source.path points at (e.g. manifests/batch/). Replace every <your-tenant> placeholder with your real tenant name.

The minimal variant runs a single Python container that reads from an input PVC and writes results to an output PVC. Uses uber-user-preempt-medium priority — deadline-bound batch willing to preempt opportunistic peers.

apiVersion: batch/v1
kind: Job
metadata:
  name: process-2026-q2
  namespace: <your-tenant>-batch
spec:
  backoffLimit: 2
  template:
    metadata:
      labels:
        job: process-2026-q2
    spec:
      restartPolicy: OnFailure
      priorityClassName: uber-user-preempt-medium
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: worker
          image: python:3.12-slim
          command: ['python', '/app/process.py']
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: [ALL]
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 2
              memory: 4Gi
          volumeMounts:
            - name: input
              mountPath: /data/input
              readOnly: true
            - name: output
              mountPath: /data/output
            - name: tmp
              mountPath: /tmp
            - name: app
              mountPath: /app
              readOnly: true
      volumes:
        - name: input
          persistentVolumeClaim:
            claimName: input-data
        - name: output
          persistentVolumeClaim:
            claimName: output-data
        - name: tmp
          emptyDir: {}
        - name: app
          configMap:
            name: process-script

The recommended variant adds an activeDeadlineSeconds timeout, explicit ttlSecondsAfterFinished cleanup, and tighter backoffLimit. Use this for production batch pipelines where you want failed Jobs to surface quickly and completed Jobs to clean up automatically.

apiVersion: batch/v1
kind: Job
metadata:
  name: process-2026-q2
  namespace: <your-tenant>-batch
spec:
  backoffLimit: 1
  activeDeadlineSeconds: 3600
  ttlSecondsAfterFinished: 86400
  template:
    metadata:
      labels:
        job: process-2026-q2
    spec:
      restartPolicy: Never
      priorityClassName: uber-user-preempt-medium
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: worker
          image: python:3.12-slim
          command: ['python', '/app/process.py']
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: [ALL]
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 2
              memory: 4Gi
          volumeMounts:
            - name: input
              mountPath: /data/input
              readOnly: true
            - name: output
              mountPath: /data/output
            - name: tmp
              mountPath: /tmp
            - name: app
              mountPath: /app
              readOnly: true
      volumes:
        - name: input
          persistentVolumeClaim:
            claimName: input-data
        - name: output
          persistentVolumeClaim:
            claimName: output-data
        - name: tmp
          emptyDir: {}
        - name: app
          configMap:
            name: process-script

CronJob — nightly report

This CronJob runs a Python script at 03:00 UTC daily that reads from a data PVC and writes a report to an output PVC in the same tenant namespace. The cluster runs on UTC — all schedule values are interpreted as UTC, not your local timezone.

Uses uber-user-significant priority — opportunistic scheduling. If the cluster is under pressure and a higher-priority pod preempts this CronJob’s pod, it simply re-runs at the next scheduled time.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-report
  namespace: <your-tenant>-batch
spec:
  schedule: "0 3 * * *"
  timeZone: "UTC"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      backoffLimit: 1
      activeDeadlineSeconds: 1800
      template:
        spec:
          restartPolicy: OnFailure
          priorityClassName: uber-user-significant
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
          containers:
            - name: reporter
              image: python:3.12-slim
              command: ['python', '/app/report.py']
              securityContext:
                allowPrivilegeEscalation: false
                readOnlyRootFilesystem: true
                capabilities:
                  drop: [ALL]
              resources:
                requests:
                  cpu: 250m
                  memory: 512Mi
                limits:
                  cpu: 1
                  memory: 2Gi
              volumeMounts:
                - name: data
                  mountPath: /data/input
                  readOnly: true
                - name: reports
                  mountPath: /data/reports
                - name: tmp
                  mountPath: /tmp
                - name: app
                  mountPath: /app
                  readOnly: true
          volumes:
            - name: data
              persistentVolumeClaim:
                claimName: tenant-data
            - name: reports
              persistentVolumeClaim:
                claimName: report-output
            - name: tmp
              emptyDir: {}
            - name: app
              configMap:
                name: report-script

Dry-run

Before committing, validate your manifests locally:

kubectl apply --dry-run=client -f manifests/batch/job.yaml
kubectl apply --dry-run=client -f manifests/batch/cronjob.yaml

Both should render without errors. This catches YAML syntax issues and missing required fields before ArgoCD tries to apply them.

Verify

After pushing to your repo and letting ArgoCD sync:

For the Job:

kubectl get jobs -n <your-tenant>-batch
kubectl get pods -n <your-tenant>-batch -l job=process-2026-q2
kubectl logs -n <your-tenant>-batch -l job=process-2026-q2

Expected:

The Job shows 1/1 under COMPLETIONS when finished.
The Pod is Completed (not Running or Error).
Logs show your processing output.

For the CronJob:

kubectl get cronjobs -n <your-tenant>-batch
kubectl get jobs -n <your-tenant>-batch
kubectl logs -n <your-tenant>-batch -l job-name=nightly-report-<timestamp>

Expected:

The CronJob shows SCHEDULE as 0 3 * * * and ACTIVE as 0 (between runs) or 1 (during a run).
Child Jobs appear after the first scheduled trigger, showing 1/1 COMPLETIONS on success.
Logs show your report output.

If something goes wrong

Pod rejected by Kyverno with PSS admission error. You modified the securityContext block. Revert to the verbatim YAML above. See Known limitations for the full PSS rule.
Namespace rejected with forceTenantPrefix. The namespace name does not start with your tenant name. All namespaces must be <your-tenant>-<suffix>. See Known limitations.
Job stuck Pending — ResourcePool exhaustion. Your tenant’s resource quota is shared across all your namespaces. Other workloads in <your-tenant>-prod or <your-tenant>-dev may be consuming the quota, leaving nothing for <your-tenant>-batch. Check allocation with kubectl describe resourcepool (it is cluster-scoped — -n is ignored); for one namespace’s usage use kubectl describe resourcequota -n <your-tenant>-batch. See Resource pools and quotas for details.
Job stuck Pending — no preemptible pods found. uber-user-preempt-medium can only preempt uber-user-significant pods. If no significant pods are running cluster-wide, there is nothing to preempt and the Job waits for capacity to free up naturally. See Priority classes for the preemption matrix.
CronJob child Job fails with connection timed out to external endpoint. Egress is default-deny on Kestrel — reaching any external endpoint requires a tenant-scoped NetworkPolicy with an explicit egress rule for that destination. Confirm such a rule exists and covers the endpoint; without one, outbound traffic to anything beyond DNS and intra-tenant services is blocked. See NetworkPolicy in practice.
CronJob never triggers. Check that the schedule is valid cron syntax and remember it is UTC. kubectl describe cronjob -n <your-tenant>-batch nightly-report shows Last Schedule Time and any scheduling errors.
ArgoCD shows OutOfSync but no errors. ArgoCD polls on a 3-minute interval. Click Sync in the ArgoCD UI for an immediate reconcile.
Need to make a temporary change outside GitOps? Use the escape hatch workflow — but remember that ArgoCD reverts any mutation on the next sync.