Moving data
Getting data onto and off of the cluster is one of the first things you will need to do. This page covers the supported patterns, their tradeoffs, and the anti-patterns that cause trouble.
Patterns
Section titled “Patterns”kubectl cp (small transfers)
Section titled “kubectl cp (small transfers)”kubectl cp copies files between your local machine and a running Pod. It works through the Kubernetes API server using tar over the exec channel — no direct network path to the Pod is needed.
# Local → Podkubectl cp ./local-file.csv <your-tenant>-prod/my-pod:/data/local-file.csv
# Pod → Localkubectl cp <your-tenant>-prod/my-pod:/data/output.csv ./output.csvThis works well for small files (under ~500 MB). For anything larger, use a PVC-backed Job or rsync sidecar instead — see the anti-patterns section below for why.
PVC-backed Jobs (bulk data processing)
Section titled “PVC-backed Jobs (bulk data processing)”A Job that mounts a PVC can pull data from an external source (object store, HTTP endpoint, database dump), process it, and write results back to the PVC. This is the standard pattern for bulk data movement because the transfer runs inside the cluster network, survives interruptions via Job retry semantics, and does not depend on your local machine staying connected.
apiVersion: batch/v1kind: Jobmetadata: name: data-import namespace: <your-tenant>-prodspec: template: spec: containers: - name: import image: curlimages/curl:latest command: - sh - -c - "curl -fSL https://example.com/dataset.tar.gz | tar xz -C /data" volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: my-data restartPolicy: OnFailureSee Jobs & CronJobs for the full recipe with PSS-compliant security context, active deadlines, and TTL cleanup.
NFS mount from outside (if your tenant has one)
Section titled “NFS mount from outside (if your tenant has one)”If RCS has provisioned a per-tenant NFS volume for you (the same static PV used for shared RWX storage — see Persistent volumes in practice), you can mount that NFS export directly from a workstation or transfer node that has access to it. This gives you native filesystem access to the same data your Pods see through the NFS-backed PVC — no Kubernetes API involved.
sudo mount -t nfs rsfshare.uvic.ca:/mlp /mnt/tenant-dataThis option is not available by default and depends on network access to the NFS server. Ask RCS whether an external NFS mount is available for your use case.
rsync sidecar (resumable transfers)
Section titled “rsync sidecar (resumable transfers)”For large or unreliable transfers, run an rsync server as a sidecar container alongside your workload. The sidecar mounts the same PVC and exposes rsync over a ClusterIP Service. You then kubectl port-forward to the Service and rsync from your local machine:
kubectl port-forward -n <your-tenant>-prod svc/rsync-sidecar 8873:873 &rsync -avz --progress ./large-dataset/ rsync://localhost:8873/data/rsync handles resume on interruption, delta transfers (only changed blocks), and progress reporting — all things kubectl cp cannot do.
Anti-patterns
Section titled “Anti-patterns”Large kubectl cp transfers
Section titled “Large kubectl cp transfers”kubectl cp streams data through the Kubernetes API server as a tar pipe over an exec channel. There is no resume, no progress indication, no delta transfer, and no compression. A network hiccup drops the entire transfer, and you start over. For anything over ~500 MB, use a PVC-backed Job or rsync sidecar instead.
Storing data in the container filesystem
Section titled “Storing data in the container filesystem”The container filesystem is ephemeral. Anything written outside a mounted PVC is lost when the Pod restarts, reschedules, or gets evicted. This is not a storage option — it is a temporary scratch space.
If your workload writes output to a local path, make sure that path is backed by a PVC volumeMount. Otherwise the data survives only as long as the specific container instance does.
Downloading large datasets at Pod startup
Section titled “Downloading large datasets at Pod startup”Pulling a multi-gigabyte dataset in an initContainer on every Pod start is slow and wasteful. The download runs every time the Pod reschedules, consumes network bandwidth, and delays readiness. Instead, download the dataset once into a PVC (via a Job) and mount that PVC into your workload.