Changes

Jump to navigation Jump to search
m
Local storage on the node
The storage on the Ceph filesystem is quite expensive due to redundancy built in (if any server reboots or is otherwise unavailable, the others can still serve all of the data). The contents of the home directories are also backed up daily onto a backup server with a file history - if you ever accidentally overwrite or otherwise lose an extremely important file, you can contact me and check if I have an old copy in a backup.
Currently, there is sufficient space left, however, I kindly ask you to not keep data you do not use anymore on the Ceph filesystem for too long. In particular, please delete old checkpoints of training runs you will never need again - I have seeen people use hundreds of Gigabytes several Terabytes for their training histories. If you still need these, please move them onto your own computers. If you really want to keep old stuff lying around on the cluster filesystem, maybe because you are not sure whether you will need it again later on, then please put it into a folder which is not backed up. For this, every user can mount the Ceph directory
- <syntaxhighlight lang="bash">/cephfs/abyss/archive/nobackup/<your-username></syntaxhighlight>
which can be used as an archive. Make sure that the directory is created if it does not exist, by specifying "type: DirectoryOrCreate".
The path for local storage for each user is
* <syntaxhighlight lang="bash">/raid/local-data/<your-username></syntaxhighlight>
You can mount it as a hostPath, but have to make sure that the directory is created if it does not exist, by specifying "type: DirectoryOrCreate".
The data will remain persistent on the host, but note that it also only exists on this particular host. If you need to access it again, you need to make sure the pod always ends up on the same specific node. See example below. Otherwise, write your scripts in such a way that they check for existence of the local data, and if it is not there yet, copy it over from somewhere on the internet.
'''In contrast to Ceph storage, local paths on the hosts are not backed up. You have been warned.'''
== Example ==

Navigation menu