Changes

← Older edit

CCU:Global dataset storage

1,802 bytes removed, 5 years ago

m

→‎List of datasets in global storage

</syntaxhighlight>

You now have a copy of your dataset on Lolth. What happens now is that roughly every hour, the datasets on Lolth are synced to the directory "/raid/datasets/your.username" on an NFS server.This directory is exported and you can mount it into any container running on the cluster. Note that every user has read access to the whole directory tree, so you can use this method to share data between users as well.As a side effect, you now also have two backups of your data on two different machines (however in the same rack, so not really fire-proof).

~~What happens now is that every hour,~~ You can also delete data from Lolth by ssh'ing into the ~~datasets are synced~~ machine and using rm to delete stuff in the ~~directory~~ "~~/raid/~~datasets/~~your.username" on the~~ cluster~~. You can check if your data is already there by creating a test container with the following configuration:~~ ~~<syntaxhighlight lang=~~"~~yaml"></syntaxhighlight>~~ ~~Once it has been copied, you can mount it into any container running on the cluster~~subdirectory. ~~Note that every user has read access to~~ During the ~~whole directory tree~~hourly sync, ~~so you can use this method to share~~ data ~~between users as well.Note that as a side effect, you now~~ not present here will also ~~have two backups of your data on two different machines (however in~~ be deleted from the ~~same rack, so not really fire-proof)~~global cluster storage.

== Accessing the global storage from within a container ==

# KITTI Dataset

└── /~~mnt~~raid/~~dataset_kitti~~datasets/general/kitti

├── training <-- 7481 train data

| ├── image_2 <-- for visualization

~~To find the dataset, just tip~~

~~<syntaxhighlight lang="bash">~~

~~> kubectl get pvc~~

~~</syntaxhighlight>~~

~~and you will get the information like following:~~

~~<syntaxhighlight lang="bash"~~TBC->~~pvc name:~~ '''nuScenes dataset~~-kitti, capacity: 100Gi, access modes: RWO, storageclass:ceph-ssd~~ ~~</syntaxhighlight>~~'''

~~To access to the dataset, you can get a root shell inside the container as usual (insert the correct pod name you used below):~~

~~> kubectl exec -it user-name-pvc-access-pod~~ # nuScenes Dataset└── /~~bin~~raid/~~bash~~datasets/general/nuscenes

</syntaxhighlight>

~~Then access into the dataset like following:~~

~~<syntaxhighlight lang="bash">~~

~~root@user-name-pvc-access-pod:/# cd mnt/dataset-kitti/~~

~~root@user-name-pvc-access-pod:/mnt/dataset-kitti#~~

~~</syntaxhighlight>~~

~~If you want to bind your project to this global dataset, please change your '''job-script-pvc.yaml''' like following:~~

~~<syntaxhighlight lang="bash">~~

~~...~~

~~# list of mount paths within the container which will be~~

~~# bound to persistent volumes.~~

~~volumeMounts:~~

~~# /tmp/data is where mnist.py will download the training data, unless it is~~

~~# already there. This is the path where we mount the persistent volume.~~

~~# Thus, for the second run of this container, the data will~~

~~# not be downloaded again.~~

~~- mountPath: "/mnt/dataset_kitti"~~

~~# # name of the volume for this path (from the below list)~~

~~name: dataset-kitti-user~~

~~# login credentials to the docker registry.~~

~~# for convenience, a readonly credential is provided as a secret in each namespace.~~

~~imagePullSecrets:~~

~~- name: registry-ro-login~~

~~# containers will never restart~~

~~restartPolicy: Never~~

~~volumes:~~

~~# User-defined name of the persistent volume within this configuration.~~

~~# This can be different from the name of the PVC.~~

~~- name: dataset-kitti-user~~

~~persistentVolumeClaim:~~

~~# name of the PVC this volume binds to~~

~~claimName: dataset-kitti~~

~~...~~

~~</syntaxhighlight>~~

~~TBC->'''nuScenes dataset'''~~

Bastian.goldluecke

ccu, Administrators

684

edits

Changes

CCU:Global dataset storage

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Collective Computational Unit

Mediawiki

Tools