</syntaxhighlight>
You now have a copy of your dataset on Lolth. What happens now is that roughly every hour, the datasets on Lolth are synced to the directory "/raid/datasets/your.username" on an NFS server.This directory is exported and you can mount it into any container running on the cluster. Note that every user has read access to the whole directory tree, so you can use this method to share data between users as well.As a side effect, you now also have two backups of your data on two different machines (however in the same rack, so not really fire-proof).
What happens now is that every hour, You can also delete data from Lolth by ssh'ing into the datasets are synced machine and using rm to delete stuff in the directory "/raid/datasets/your.username" on the cluster. You can check if your data is already there by creating a test container with the following configuration: <syntaxhighlight lang="yaml"></syntaxhighlight> Once it has been copied, you can mount it into any container running on the clustersubdirectory. Note that every user has read access to During the whole directory treehourly sync, so you can use this method to share data between users as well.Note that as a side effect, you now not present here will also have two backups of your data on two different machines (however in be deleted from the same rack, so not really fire-proof)global cluster storage.
== Accessing the global storage from within a container ==
<syntaxhighlight lang="bash">
# KITTI Dataset
└── /mntraid/dataset_kittidatasets/general/kitti
├── training <-- 7481 train data
| ├── image_2 <-- for visualization
To find the dataset, just tip
<syntaxhighlight lang="bash">
> kubectl get pvc
</syntaxhighlight>
and you will get the information like following:
<syntaxhighlight lang="bash"TBC->pvc name: '''nuScenes dataset-kitti, capacity: 100Gi, access modes: RWO, storageclass:ceph-ssd </syntaxhighlight>'''
To access to the dataset, you can get a root shell inside the container as usual (insert the correct pod name you used below):
<syntaxhighlight lang="bash">
> kubectl exec -it user-name-pvc-access-pod # nuScenes Dataset└── /binraid/bashdatasets/general/nuscenes
</syntaxhighlight>
Then access into the dataset like following:
<syntaxhighlight lang="bash">
root@user-name-pvc-access-pod:/# cd mnt/dataset-kitti/
root@user-name-pvc-access-pod:/mnt/dataset-kitti#
</syntaxhighlight>
If you want to bind your project to this global dataset, please change your '''job-script-pvc.yaml''' like following:
<syntaxhighlight lang="bash">
...
# list of mount paths within the container which will be
# bound to persistent volumes.
volumeMounts:
# /tmp/data is where mnist.py will download the training data, unless it is
# already there. This is the path where we mount the persistent volume.
# Thus, for the second run of this container, the data will
# not be downloaded again.
- mountPath: "/mnt/dataset_kitti"
# # name of the volume for this path (from the below list)
name: dataset-kitti-user
# login credentials to the docker registry.
# for convenience, a readonly credential is provided as a secret in each namespace.
imagePullSecrets:
- name: registry-ro-login
# containers will never restart
restartPolicy: Never
volumes:
# User-defined name of the persistent volume within this configuration.
# This can be different from the name of the PVC.
- name: dataset-kitti-user
persistentVolumeClaim:
# name of the PVC this volume binds to
claimName: dataset-kitti
...
</syntaxhighlight>
TBC->'''nuScenes dataset'''