Changes

Jump to navigation Jump to search
no edit summary
**** THIS IS OUTDATED INFORMATION, PLEASE REFER TO [[CCU:Perstistent storage on the Kubernetes cluster]] instead.
 
 
== Prerequisites ==
* Pre-requisited from [[Tutorials:Run_the_example_container_on_the_cluster|previous tutorial]].
* Sample code from [[Tutorials:Run_the_example_container_on_the_cluster|previous tutorial]].
 
 
== Global dataset storage for large, static datasets ==
 
The first cluster node exports an NFS filesystem on a large NVMe-Raid, which is reasonably fast and can be used as a global dataset storage. It can be mounted into a pod as follows:
 
<syntaxhighlight lang="yaml">
apiVersion: v1
kind: Pod
metadata:
name: your-username-test-global-storage
spec:
containers:
- name: your-username-test-global-storage
 
# we use a small ubuntu base to access the PVC
image: ubuntu:18.04
# make sure that we have some time until the container quits by itself
command: ['sleep', '6h']
 
volumeMounts:
# Path to mount the NFS volume to
- mountPath: "/mnt/datasets"
name: datasets-nfs
# NFS is exported read-only
readOnly: true
 
volumes:
# Volume which mounts the NFS server exported to the cluster by ccu-node1
- name: datasets-nfs
nfs:
server: ccu-node1
path: /raid/datasets
</syntaxhighlight>
 
Please see the page [[CCU:Global dataset storage|on global storage]] for a list of available datasets and the method to upload your own.
 
* Local persistent volumes
* Global persistent volumes
Local Note: the cluster will soon get large, fast global storage, at this point local persistent volumes will be phased out and probably not available anymore. Tensorboard monitoring should be used to import training data and store results done using service exports, as explained below, and log files not make use of your training. There are special local PVs for monitoring the training using Tensorboard. Host directories are meant for common training data sets stored permanently on the host. They are always read only.
accessModes:
- ReadWriteOnce
- ReadOnlyMany
# For me (Felix) it worked only with the additional following line:
volumeMode: Filesystem
</syntaxhighlight>
Since anyone can mount global persistent volumes in the same namespace, they can and should be used to share datasets. The name of a PVC which contains a useful dataset should start with "dataset-" and be descriptive, so that it can easily be found by other users. Also, the root of the PVC should contain a README with informations about the dataset (at least the source and what exactly it is). Finally, it is probably good practice if other users of the dataset which are not the creator mount the volume readonly (by specifying "readOnly: true" after the mountPath in the pod's yaml).
 === Global dataset storage for large, static datasets === Every node has a link to a global repository "/raid/datasets" in its filesystem, which sits A note on very fast NVMe raid mounting. Currently (1.9 GB/s readwill change in the near future) and , ceph volumes can be either mounted read-ReadWrite by a single pod only in every container as in , or ReadOnly by multiple pods. Thus, the following example: <syntaxhighlight lang="yaml">todo</syntaxhighlight> Please see workflow for a static dataset is to create the page [[CCU:Global dataset storage|on global storage]] for PVC, then create a list of available datasets and pod to write all the method data to upload your ownit, then delete this pod and mount it read only from now on so it can be used in multiple pods.
== Reading/writing the contents of a persistent volume ==

Navigation menu