Skip to content

Moving data

Getting data onto and off of the cluster is one of the first things you will need to do. This page covers the supported patterns, their tradeoffs, and the anti-patterns that cause trouble.

kubectl cp copies files between your local machine and a running Pod. It works through the Kubernetes API server using tar over the exec channel — no direct network path to the Pod is needed.

Terminal window
# Local → Pod
kubectl cp ./local-file.csv <your-tenant>-prod/my-pod:/data/local-file.csv
# Pod → Local
kubectl cp <your-tenant>-prod/my-pod:/data/output.csv ./output.csv

This works well for small files (under ~500 MB). For anything larger, use a PVC-backed Job or rsync sidecar instead — see the anti-patterns section below for why.

A Job that mounts a PVC can pull data from an external source (object store, HTTP endpoint, database dump), process it, and write results back to the PVC. This is the standard pattern for bulk data movement because the transfer runs inside the cluster network, survives interruptions via Job retry semantics, and does not depend on your local machine staying connected.

apiVersion: batch/v1
kind: Job
metadata:
name: data-import
namespace: <your-tenant>-prod
spec:
template:
spec:
containers:
- name: import
image: curlimages/curl:latest
command:
- sh
- -c
- "curl -fSL https://example.com/dataset.tar.gz | tar xz -C /data"
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: my-data
restartPolicy: OnFailure

See Jobs & CronJobs for the full recipe with PSS-compliant security context, active deadlines, and TTL cleanup.

NFS mount from outside (if your tenant has one)

Section titled “NFS mount from outside (if your tenant has one)”

If RCS has provisioned a per-tenant NFS volume for you (the same static PV used for shared RWX storage — see Persistent volumes in practice), you can mount that NFS export directly from a workstation or transfer node that has access to it. This gives you native filesystem access to the same data your Pods see through the NFS-backed PVC — no Kubernetes API involved.

Terminal window
sudo mount -t nfs rsfshare.uvic.ca:/mlp /mnt/tenant-data

This option is not available by default and depends on network access to the NFS server. Ask RCS whether an external NFS mount is available for your use case.

For large or unreliable transfers, run an rsync server as a sidecar container alongside your workload. The sidecar mounts the same PVC and exposes rsync over a ClusterIP Service. You then kubectl port-forward to the Service and rsync from your local machine:

Terminal window
kubectl port-forward -n <your-tenant>-prod svc/rsync-sidecar 8873:873 &
rsync -avz --progress ./large-dataset/ rsync://localhost:8873/data/

rsync handles resume on interruption, delta transfers (only changed blocks), and progress reporting — all things kubectl cp cannot do.

kubectl cp streams data through the Kubernetes API server as a tar pipe over an exec channel. There is no resume, no progress indication, no delta transfer, and no compression. A network hiccup drops the entire transfer, and you start over. For anything over ~500 MB, use a PVC-backed Job or rsync sidecar instead.

The container filesystem is ephemeral. Anything written outside a mounted PVC is lost when the Pod restarts, reschedules, or gets evicted. This is not a storage option — it is a temporary scratch space.

If your workload writes output to a local path, make sure that path is backed by a PVC volumeMount. Otherwise the data survives only as long as the specific container instance does.

Pulling a multi-gigabyte dataset in an initContainer on every Pod start is slow and wasteful. The download runs every time the Pod reschedules, consumes network bandwidth, and delays readiness. Instead, download the dataset once into a PVC (via a Job) and mount that PVC into your workload.