CCU:Perstistent storage on the Kubernetes cluster

From Collective Computational Unit
Jump to navigation Jump to search

The CephFS file system

As explained in the quick start tutorial, every user can mount certain local host paths inside their pods, which refer to a global distributed Ceph file system.

This file system is usually very fast, but only if it is used for workloads it is designed for. Remember that it is a distributed storage, this means that metadata access (such as file attributes, or on which server to look for a specific file) is over a database and can be a bottleneck. In effect, performance breaks down dramatically if writing or accessing many small files, or having many small files in a single directory (which forces metadata to be stored on a single server).

TL;DR, and this is very important: when using CephFS, make sure to organize your dataset in few large files (e.g. HDF5), and not many small ones !

If this is not possible for you, then you need to resort to persistent volumes residing on local storage on a single node, which for small files is orders of magnitude faster, but you are bound to a particular node (or have to duplicate the data in different local filesystems). A tutorial follows.

Local storage on the node

The path for local storage for each user is

  • /raid/local-data/<your-username>

You can mount it as a hostPath, but have to make sure that the directory is created if it does not exist, by specifying "type: DirectoryOrCreate".

The data will remain persistent on the host, but note that it also only exists on this particular host. If you need to access it again, you need to make sure the pod always ends up on the same specific node. See example below. Otherwise, write your scripts in such a way that they check for existence of the local data, and if it is not there yet, copy it over from somewhere on the internet.


Example

The following example creates an access pod on the compute node "tiamat" which mounts the local storage as well as all your personal directories in the ceph file system:

apiVersion: v1
kind: Pod
metadata:
  name: storage-access-pod-tiamat
spec:
  nodeSelector:
    kubernetes.io/hostname: tiamat

  containers:
  - name: ubuntu
    image: ubuntu:20.04
    command: ["sleep", "1d"]
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1
        memory: 1Gi
    volumeMounts:
      - mountPath: /abyss/home
        name: cephfs-home
        readOnly: false
      - mountPath: /abyss/shared
        name: cephfs-shared
        readOnly: false
      - mountPath: /abyss/datasets
        name: cephfs-datasets
        readOnly: true
      - mountPath: /local
        name: local-storage
        readOnly: false
  volumes:
    - name: cephfs-home
      hostPath:
        path: "/cephfs/abyss/home/<your-username>"
        type: Directory
    - name: cephfs-shared
      hostPath:
        path: "/cephfs/abyss/shared"
        type: Directory
    - name: cephfs-datasets
      hostPath:
        path: "/cephfs/abyss/datasets"
        type: Directory
    - name: local-storage
      hostPath:
        path: "/raid/local-data/<your-username>"
        type: DirectoryOrCreate

Reading/writing to the directories in the pod

After you have created the access pod with "kubectl apply -f <filename>.yaml", you have several options to get data to and from the container.

Copying data from within the container

You can get a root shell inside the container as usual (insert the correct pod name you used below):

> kubectl exec -it access-pod -- /bin/bash

Your pod has internet access. Thus, an option to get data to/from the pod, in particular into the persistent volume, is to use scp, which first might need to be installed inside the pod:

# apt-get update && apt install openssh-client rsync
# cd /my-pvc-mount-path
# scp your.username@external-server:/path/to/data/. ./

An even better variant would be "rsync -av" instead of scp, as this only copies files which are different or do not exist in the destination. By reversing source and destination, you can also copy data out of the container this way.

Copying data from your local machine

From the local machine which has kubectl access to the cluster, you can directly copy data to and from the container using kubectl cp, which has a very similar syntax as scp:

# to get data into the container, substitute name with correct id obtained from kubectl get pods
> kubectl cp /path/to/data/. pvc-access-pod:/my-pvc-mount/path/data
# to get data from the container
> kubectl cp pvc-access-pod:/my-pvc-mount/path/. /path/to/output/

Read up on Kubernetes "kubectl cp" documentation to check how it handles directories, it's a bit unusual and slightly different from scp.

Note: kubectl cp internally uses tar and some compression to speed up network transfer. However, this means that your access pod needs a certain amount of memory, in particular when transferring large files. If you run into "error 137" (out of memory), increase memory limits of the access pod or use scp from within the pod.