Troubleshooting index

A flat search index of error messages and symptoms you might hit on Kestrel, organized for grep. Each heading is a verbatim error string or recognizable symptom so you can search this page with Ctrl+F (or Cmd+F) from your terminal output.

If you are in the first 60 seconds of debugging and want a guided flow instead, start with the Triage page — it has four branches that narrow the problem quickly. This page is the deeper reference for when you already know which error you are looking at.

For question-oriented answers (“why does X happen?”), see the FAQ.

kubelogin and identity

`error: You must be logged in to the server (Unauthorized)`

Cause: your OIDC token is missing, expired, or was issued by the wrong identity provider.

Fix: clear the token cache and trigger a fresh login:

kubectl oidc-login clean
kubectl get ns

If the fresh login still fails, confirm you are using the int128/kubelogin plugin (not the Microsoft Azure AD one) and that your kubeconfig has the correct oidc-issuer-url and oidc-client-id. See Install kubelogin for the full setup.

`State does not match`

Cause: stale OIDC state in the local cache, usually from an aborted login or a duplicate browser tab that followed an old callback link.

Fix: clear the cache and retry in a fresh browser window:

kubectl oidc-login clean
kubectl get ns

If it persists, try an incognito/private browser window so the Keycloak session cookie is clean. See kubelogin troubleshooting — state does not match.

`kubectl get ns` hangs — no browser opens

Cause: you are running inside WSL2, tmux, or a remote SSH session with no local GUI browser for kubelogin to open.

Fix: add --skip-open-browser to your kubeconfig exec.args, copy the printed URL to a local browser, and forward the callback port if the session is remote:

ssh -L 8000:localhost:8000 <user>@<remote-host>

Full steps at kubelogin troubleshooting — browser does not open.

`kubectl get ns` returns zero namespaces (empty result, no error)

Cause: your OIDC token is valid, but its groups claim does not contain a Cloud RAP group that maps to a Capsule Tenant. The most common reason is that your PI has not added you to the Cloud RAP in CCDB — Cloud RAPs do not auto-add sponsored users.

Fix: ask your PI to confirm you are listed on their Cloud RAP at ccdb.alliancecan.ca, wait for LDAP propagation (usually immediate, occasionally a few minutes), then clear the cache and retry:

kubectl oidc-login clean
kubectl get ns

See Requesting access for the end-to-end flow.

`listen tcp 127.0.0.1:8000: bind: address already in use`

Cause: another process already bound localhost:8000 — a dev server, a previous kubelogin session, or a Docker port publish.

Fix: override the listen address in your kubeconfig exec.args:

args:
  - get-token
  - --oidc-issuer-url=<keycloak-issuer-url>
  - --oidc-client-id=<your-keycloak-client-id>
  - --listen-address=localhost:18000

Or set KUBELOGIN_LISTEN_ADDRESS=localhost:18000 in your shell before running kubectl. See kubelogin troubleshooting — port 8000 already in use.

ArgoCD sync errors

`admission webhook "validate.kyverno.svc-fail" denied the request`

Cause: your Pod spec violates the Kyverno-enforced Pod Security Standards (restricted profile). Common missing fields: securityContext.runAsNonRoot, securityContext.seccompProfile.type, or securityContext.capabilities.drop.

Fix: add the required security context to your Pod spec:

securityContext:
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop: [ALL]

See Known limitations — Pod Security restricted for the full checklist.

`namespaces "<name>" is forbidden: unable to create new content in namespace`

Cause: the namespace name does not start with your tenant prefix (<tenant>-*). Capsule’s forceTenantPrefix admission webhook rejects namespace names that do not match.

Fix: rename the namespace to include your tenant prefix. For example, if your tenant is def-profname, the namespace must be def-profname-<suffix>. See Known limitations — namespace prefix and Namespaces.

`priorityclasses "<name>" is forbidden`

Cause: the priority class you specified is not in the Capsule allowlist for your tenant.

Fix: use one of the allowlisted priority classes. See Priority classes for the list and Known limitations — priority class allowlist for the rationale.

Application shows `Unknown` or `ComparisonError`

Cause: ArgoCD cannot parse or diff the manifests. This usually means invalid YAML, a missing CRD on the cluster, or a Helm template rendering error.

Fix: validate your manifests locally with kubectl apply --dry-run=client -f <manifest>.yaml. If using Helm, run helm template locally and check the output. See ArgoCD on Kestrel for the recommended workflow.

Ingress and TLS

Ingress returns 404 with a valid TLS certificate

Cause: the TLS handshake succeeded (cert-manager issued the cert), but Traefik cannot route to your backend. The most common cause is a Service selector that does not match Pod labels.

Fix: compare spec.selector on your Service with spec.template.metadata.labels on your Deployment — they must match exactly. Also confirm the Service port matches the container port. See Ingress on Kestrel for the full recipe.

Ingress returns 404 — no TLS certificate yet

Cause: cert-manager has not finished issuing the certificate. The first issue takes about 30 seconds; if it takes longer, the ACME challenge may be failing.

Fix: check certificate status:

kubectl get certificate -n <your-tenant>-<namespace>
kubectl describe certificate -n <your-tenant>-<namespace> <cert-name>

Look at the Events section for ACME challenge errors. See TLS on Kestrel for the full flow.

`connection refused` when visiting the Ingress hostname

Cause: the Ingress resource is missing ingressClassName: traefik, so Traefik does not pick it up.

Fix: add the annotation to your Ingress spec:

spec:
  ingressClassName: traefik

See Ingress on Kestrel and Triage — Ingress 404.

`services "<name>" is forbidden: service type LoadBalancer is not allowed`

Cause: Capsule restricts Service types on Kestrel. LoadBalancer is not available.

Fix: use ClusterIP with an Ingress instead. See Service types and Known limitations — LoadBalancer blocked.

Storage and quotas

PVC stuck in `Pending`

Cause (quota): the tenant’s ResourcePool storage quota is exhausted. Check with kubectl describe resourcepool (it is cluster-scoped — the -n flag is ignored).

Cause (wrong access mode): you requested ReadWriteMany (RWX) on a Cinder storage class. Kestrel provides two OpenStack Cinder storage classes: csi-cinder-sc-delete (the default; reclaim policy Delete) and csi-cinder-sc-retain (reclaim policy Retain). Both are block volumes and support ReadWriteOnce only. ReadWriteMany (shared) volumes are not offered through a storage class — open a ticket with RCS to have a per-tenant NFS volume provisioned.

Cause (missing class): the storageClassName in your PVC does not match any available class on the cluster.

Fix: check PVC events for the specific reason:

kubectl describe pvc -n <your-tenant>-<namespace> <pvc-name>

See Storage classes and Persistent volumes.

`Insufficient cpu` or `Insufficient memory` — Pod fails to schedule

Cause: the tenant’s ResourcePool quota for CPU or memory is exhausted.

Fix: check current usage:

kubectl describe resourcepool

Common culprits: completed Jobs still holding resources, pending Pods, orphaned PVCs, overprovisioned requests vs actual usage. See Viewing your allocation for diagnosis and Requesting quota changes if you need a larger tier.

`exceeded quota` in ArgoCD sync

Cause: the resource request in your manifest exceeds the remaining ResourcePool quota for your tenant.

Fix: either reduce the resource requests in your manifest or request a tier upgrade. See Resource pools and quotas for the tier numbers and Requesting quota changes for the upgrade process.

Troubleshooting index

kubelogin and identity

error: You must be logged in to the server (Unauthorized)

State does not match

kubectl get ns hangs — no browser opens

kubectl get ns returns zero namespaces (empty result, no error)

listen tcp 127.0.0.1:8000: bind: address already in use

ArgoCD sync errors

admission webhook "validate.kyverno.svc-fail" denied the request

namespaces "<name>" is forbidden: unable to create new content in namespace

priorityclasses "<name>" is forbidden

Application shows Unknown or ComparisonError