Troubleshooting index
A flat search index of error messages and symptoms you might hit on Kestrel, organized for grep. Each heading is a verbatim error string or recognizable symptom so you can search this page with Ctrl+F (or Cmd+F) from your terminal output.
If you are in the first 60 seconds of debugging and want a guided flow instead, start with the Triage page — it has four branches that narrow the problem quickly. This page is the deeper reference for when you already know which error you are looking at.
For question-oriented answers (“why does X happen?”), see the FAQ.
kubelogin and identity
Section titled “kubelogin and identity”error: You must be logged in to the server (Unauthorized)
Section titled “error: You must be logged in to the server (Unauthorized)”Cause: your OIDC token is missing, expired, or was issued by the wrong identity provider.
Fix: clear the token cache and trigger a fresh login:
kubectl oidc-login cleankubectl get nsIf the fresh login still fails, confirm you are using the int128/kubelogin plugin (not the Microsoft Azure AD one) and that your kubeconfig has the correct oidc-issuer-url and oidc-client-id. See Install kubelogin for the full setup.
State does not match
Section titled “State does not match”Cause: stale OIDC state in the local cache, usually from an aborted login or a duplicate browser tab that followed an old callback link.
Fix: clear the cache and retry in a fresh browser window:
kubectl oidc-login cleankubectl get nsIf it persists, try an incognito/private browser window so the Keycloak session cookie is clean. See kubelogin troubleshooting — state does not match.
kubectl get ns hangs — no browser opens
Section titled “kubectl get ns hangs — no browser opens”Cause: you are running inside WSL2, tmux, or a remote SSH session with no local GUI browser for kubelogin to open.
Fix: add --skip-open-browser to your kubeconfig exec.args, copy the printed URL to a local browser, and forward the callback port if the session is remote:
ssh -L 8000:localhost:8000 <user>@<remote-host>Full steps at kubelogin troubleshooting — browser does not open.
kubectl get ns returns zero namespaces (empty result, no error)
Section titled “kubectl get ns returns zero namespaces (empty result, no error)”Cause: your OIDC token is valid, but its groups claim does not contain a Cloud RAP group that maps to a Capsule Tenant. The most common reason is that your PI has not added you to the Cloud RAP in CCDB — Cloud RAPs do not auto-add sponsored users.
Fix: ask your PI to confirm you are listed on their Cloud RAP at ccdb.alliancecan.ca, wait for LDAP propagation (usually immediate, occasionally a few minutes), then clear the cache and retry:
kubectl oidc-login cleankubectl get nsSee Requesting access for the end-to-end flow.
listen tcp 127.0.0.1:8000: bind: address already in use
Section titled “listen tcp 127.0.0.1:8000: bind: address already in use”Cause: another process already bound localhost:8000 — a dev server, a previous kubelogin session, or a Docker port publish.
Fix: override the listen address in your kubeconfig exec.args:
args: - get-token - --oidc-issuer-url=<keycloak-issuer-url> - --oidc-client-id=<your-keycloak-client-id> - --listen-address=localhost:18000Or set KUBELOGIN_LISTEN_ADDRESS=localhost:18000 in your shell before running kubectl. See kubelogin troubleshooting — port 8000 already in use.
ArgoCD sync errors
Section titled “ArgoCD sync errors”admission webhook "validate.kyverno.svc-fail" denied the request
Section titled “admission webhook "validate.kyverno.svc-fail" denied the request”Cause: your Pod spec violates the Kyverno-enforced Pod Security Standards (restricted profile). Common missing fields: securityContext.runAsNonRoot, securityContext.seccompProfile.type, or securityContext.capabilities.drop.
Fix: add the required security context to your Pod spec:
securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault capabilities: drop: [ALL]See Known limitations — Pod Security restricted for the full checklist.
namespaces "<name>" is forbidden: unable to create new content in namespace
Section titled “namespaces "<name>" is forbidden: unable to create new content in namespace”Cause: the namespace name does not start with your tenant prefix (<tenant>-*). Capsule’s forceTenantPrefix admission webhook rejects namespace names that do not match.
Fix: rename the namespace to include your tenant prefix. For example, if your tenant is def-profname, the namespace must be def-profname-<suffix>. See Known limitations — namespace prefix and Namespaces.
priorityclasses "<name>" is forbidden
Section titled “priorityclasses "<name>" is forbidden”Cause: the priority class you specified is not in the Capsule allowlist for your tenant.
Fix: use one of the allowlisted priority classes. See Priority classes for the list and Known limitations — priority class allowlist for the rationale.
Application shows Unknown or ComparisonError
Section titled “Application shows Unknown or ComparisonError”Cause: ArgoCD cannot parse or diff the manifests. This usually means invalid YAML, a missing CRD on the cluster, or a Helm template rendering error.
Fix: validate your manifests locally with kubectl apply --dry-run=client -f <manifest>.yaml. If using Helm, run helm template locally and check the output. See ArgoCD on Kestrel for the recommended workflow.
Ingress and TLS
Section titled “Ingress and TLS”Ingress returns 404 with a valid TLS certificate
Section titled “Ingress returns 404 with a valid TLS certificate”Cause: the TLS handshake succeeded (cert-manager issued the cert), but Traefik cannot route to your backend. The most common cause is a Service selector that does not match Pod labels.
Fix: compare spec.selector on your Service with spec.template.metadata.labels on your Deployment — they must match exactly. Also confirm the Service port matches the container port. See Ingress on Kestrel for the full recipe.
Ingress returns 404 — no TLS certificate yet
Section titled “Ingress returns 404 — no TLS certificate yet”Cause: cert-manager has not finished issuing the certificate. The first issue takes about 30 seconds; if it takes longer, the ACME challenge may be failing.
Fix: check certificate status:
kubectl get certificate -n <your-tenant>-<namespace>kubectl describe certificate -n <your-tenant>-<namespace> <cert-name>Look at the Events section for ACME challenge errors. See TLS on Kestrel for the full flow.
connection refused when visiting the Ingress hostname
Section titled “connection refused when visiting the Ingress hostname”Cause: the Ingress resource is missing ingressClassName: traefik, so Traefik does not pick it up.
Fix: add the annotation to your Ingress spec:
spec: ingressClassName: traefikSee Ingress on Kestrel and Triage — Ingress 404.
services "<name>" is forbidden: service type LoadBalancer is not allowed
Section titled “services "<name>" is forbidden: service type LoadBalancer is not allowed”Cause: Capsule restricts Service types on Kestrel. LoadBalancer is not available.
Fix: use ClusterIP with an Ingress instead. See Service types and Known limitations — LoadBalancer blocked.
Storage and quotas
Section titled “Storage and quotas”PVC stuck in Pending
Section titled “PVC stuck in Pending”Cause (quota): the tenant’s ResourcePool storage quota is exhausted. Check with kubectl describe resourcepool (it is cluster-scoped — the -n flag is ignored).
Cause (wrong access mode): you requested ReadWriteMany (RWX) on a Cinder storage class. Kestrel provides two OpenStack Cinder storage classes: csi-cinder-sc-delete (the default; reclaim policy Delete) and csi-cinder-sc-retain (reclaim policy Retain). Both are block volumes and support ReadWriteOnce only. ReadWriteMany (shared) volumes are not offered through a storage class — open a ticket with RCS to have a per-tenant NFS volume provisioned.
Cause (missing class): the storageClassName in your PVC does not match any available class on the cluster.
Fix: check PVC events for the specific reason:
kubectl describe pvc -n <your-tenant>-<namespace> <pvc-name>See Storage classes and Persistent volumes.
Insufficient cpu or Insufficient memory — Pod fails to schedule
Section titled “Insufficient cpu or Insufficient memory — Pod fails to schedule”Cause: the tenant’s ResourcePool quota for CPU or memory is exhausted.
Fix: check current usage:
kubectl describe resourcepoolCommon culprits: completed Jobs still holding resources, pending Pods, orphaned PVCs, overprovisioned requests vs actual usage. See Viewing your allocation for diagnosis and Requesting quota changes if you need a larger tier.
exceeded quota in ArgoCD sync
Section titled “exceeded quota in ArgoCD sync”Cause: the resource request in your manifest exceeds the remaining ResourcePool quota for your tenant.
Fix: either reduce the resource requests in your manifest or request a tier upgrade. See Resource pools and quotas for the tier numbers and Requesting quota changes for the upgrade process.