Changes

← Older edit

CCU:Perstistent storage on the Kubernetes cluster

1,270 bytes added, 1 year ago

m

→‎Local storage on the node

== The CephFS file system ==

As explained in the [[CCU:GPU Cluster Quick Start|quick start tutorial]], every user can mount certain local host paths inside their pods, which refer to a global distributed Ceph file system.Reminder, the primary home directory is <syntaxhighlight lang="bash">/cephfs/abyss/home/<your-username></syntaxhighlight> This file system is usually quite fast, but only if it is used for workloads it is designed for. It is a distributed storage, where the filesystem metadata is stored in databases on different servers, and the actual content of the files on other ones. This means that metadata access (such as reading file attributes, or on which server to look for a specific file) can be a bottleneck. In effect, the task of reading the metadata for a small file is orders of magnitude more expensive than reading the actual contents of the file itself. This means that performance breaks down dramatically if writing or accessing many small files. In particular, having many small files in a single directory (say >10k) makes any simple filesystem operations such as directory listings take ages, and in particular automated backup jobs might run into problems. '''TL;DR, and this is very important: when using CephFS, make sure to organize your dataset in few large files (e.g. HDF5), and not many small ones ! If you really have to have individual files, then make sure they are stored in subdirectories which do not become too large. ''' For example, if you have a million images of the form abcdef.jpg in a single directory, you better distribute them over a directory tree a/b/c/def.jpg, so that it is only 1000 files per directory. An interesting option if you have a dataset consisting of many small files might be to keep it in a tar archive and mount that archive using [https://github.com/mxmlnkn/ratarmount ratarmount]. If this is not possible for you, then you need to use the local SSD storage on a single node, which for small files is orders of magnitude faster, but you are bound to a particular node (or have to duplicate the data in different local filesystems). See below for details on local filesystems. == CephFS capacity and backup strategy ==

~~This file system~~ The storage on the Ceph filesystem is ~~usually very fast, but only~~ quite expensive due to redundancy built in (if it any server reboots or is ~~used for workloads it is designed for~~otherwise unavailable, the others can still serve all of the data). ~~Remember that it is~~ The contents of the home directories are also backed up daily onto a ~~distributed storage, this means that metadata access is over~~ backup server with a ~~database and can be a bottleneck. In effect, performance breaks down dramatically~~ file history - if ~~writing~~ you ever accidentally overwrite or ~~accessing many small files~~otherwise lose an extremely important file, ~~or having many small files~~ you can contact me and check if I have an old copy in a ~~single directory (which forces metadata to be stored on a single server)~~backup.

~~'''TL;DR~~Currently, ~~and this~~ there is ~~very important: when using CephFS~~sufficient space left, ~~make sure~~ however, I kindly ask you to ~~organize~~ not keep data you do not use anymore on the Ceph filesystem for too long. In particular, please delete old checkpoints of training runs you will never need again - I have seeen people use several Terabytes for their training histories. If you still need these, please move them onto your ~~dataset in few large files (e~~own computers.gIf you really want to keep old stuff lying around on the cluster filesystem, maybe because you are not sure whether you will need it again later on, then please put it into a folder which is not backed up. ~~HDF5)~~For this, ~~and not many small ones !'''~~every user can mount the Ceph directory

If this is not possible for you, then you need to resort to persistent volumes residing on local storage on a single node, which for small files is orders of magnitude faster, but you are bound to a particular node (or have to duplicate the data in different local filesystems). A tutorial follows.<syntaxhighlight lang="bash">/cephfs/abyss/archive/nobackup/<your-username></syntaxhighlight>

which can be used as an archive. Make sure that the directory is created if it does not exist, by specifying "type: DirectoryOrCreate".

== Local storage on the node ==

The path for local storage for each user is

* <syntaxhighlight lang="bash">/raid/local-data/<your-username></syntaxhighlight>

You can mount it as a hostPath, but have to make sure that the directory is created if it does not exist, by specifying "type: DirectoryOrCreate".

The data will remain persistent on the host, but note that it also only exists on this particular host. If you need to access it again, you need to make sure the pod always ends up on the same specific node. See example below. Otherwise, write your scripts in such a way that they check for existence of the local data, and if it is not there yet, copy it over from somewhere on the internet.

'''In contrast to Ceph storage, local paths on the hosts are not backed up. You have been warned.'''

== Example ==

</syntaxhighlight>

== Reading/writing ~~the contents of a persistent volume ==~~ ~~You can access a PV which is bound~~ to ~~a PVC by mounting it into a container. For a demonstration, we use~~ the ~~simple container image "ubuntu:18.04", which runs a minimalistic Ubuntu, and keep it~~ directories in ~~a very long wait after container startup.~~ ~~<syntaxhighlight lang="yaml"># Test pod to mount a PV bound to a PVC into a container# Before starting this pod, apply~~ the ~~PVC with kubectl apply -f pvc.yamlapiVersion: v1kind: Podmetadata:~~ ~~name: your-username-pvc-access-~~pod~~spec:~~ ~~containers:~~ ~~- name: pvc-access-container~~ ~~# we use a small ubuntu base to access the PVC~~ ~~image: ubuntu:18.04~~ ~~# make sure that we have some time until the container quits by itself~~ ~~command: ['sleep', '6h']~~ ~~# list of mount paths within the container which will be~~ ~~# bound to persistent volumes.~~ ~~volumeMounts:~~ ~~- mountPath: "/mnt/pvc-mnist"~~ ~~# name of the volume for this path (from the below list)~~ ~~name: pvc-mnist~~ ~~volumes:~~ ~~# User-defined name of the persistent volume within this configuration.~~ ~~# This can be different from the name of the PVC.~~ ~~- name: pvc-mnist~~ ~~persistentVolumeClaim:~~ ~~# name of the PVC this volume binds to~~ ~~claimName: your-username-tf-mnist-pvc</syntaxhighlight>~~ ~~After the PVC is applied, spin up the test pod with~~ ~~<syntaxhighlight lang~~=~~"yaml">> kubectl apply -f pvc-access-pod.yaml</syntaxhighlight>~~=

~~You now~~ After you have created the access pod with "kubectl apply -f <filename>.yaml", you have several options to get data to and from the container.

=== 1. Copying data from within the container ===

You can get a root shell inside the container as usual (insert the correct pod name you used below):

> kubectl exec -it ~~pvc-~~access-pod -- /bin/bash

</syntaxhighlight>

Your pod has internet access. Thus, an option to get data to/from the pod, in particular into the persistent volume, is to use scp, which first ~~needs~~ might need to be installed inside the pod:

An even better variant would be "rsync -av" instead of scp, as this only copies files which are different or do not exist in the destination. By reversing source and destination, you can also copy data out of the container this way.

=== 2. Copying data from ~~the outside~~ your local machine ===

From the ~~outside world~~local machine which has kubectl access to the cluster, you can directly copy data to and from the container using kubectl cp, which has a very similar syntax as scp:

Bastian.goldluecke

ccu, Administrators

684

edits

Changes

CCU:Perstistent storage on the Kubernetes cluster

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Collective Computational Unit

Mediawiki

Tools