Difference between revisions of "CCU:Global dataset storage"

Revision as of 07:52, 15 July 2020

Overview

The global dataset storage is intended for large, static datasets, in particular those which benefit multiple users (but feel free to also use it for your own data which only you need). Write access is very slow since it is tunneled over a slow filesystem for security and backup reasons (see below for technical details), so it will take a while until your datasets actually show up on the cluster. Read access, however, should be very fast (the NVMe RAID where it resides has 1.9 GB/s read speed, it is accessed over a 10 GBit/s Network from nodes other than the DGX-2), and might in some cases even surpass local storage.

The global storage can be easily mounted in any container on any node as a read-only volume, while you have to write to it using certain rsync commands on the master node. See below for detailed instructions. Every user has their own subdirectory within the global storage (readable by everyone, writeable only by that user). In addition, there is a user-independent directory subtree with common machine learning datasets. If you believe you have a dataset in your own subdirectory which is static and beneficial for many users, please contact me to move it to the common tree.

Writing your data to the global storage

Accessing the global storage from within a container

Please see this page for an introduction on how to use the datasets.

List of datasets in global storage

Everyone, please update this list if you have any useful datasets to share. Feel free to generate additional pages on the Wiki in case a dataset needs more description, or link to your project page in the respective column (see example below).

TODO.

@@ Line 1: / Line 1: @@
 == Overview ==
+The global dataset storage is intended for large, static datasets, in particular those which benefit multiple users (but feel free to also use it for your own data which only you need). Write access is very slow since it is tunneled over a slow filesystem for security and backup reasons (see below for technical details), so it will take a while until your datasets actually show up on the cluster. Read access, however, should be very fast (the NVMe RAID where it resides has 1.9 GB/s read speed, it is accessed over a 10 GBit/s Network from nodes other than the DGX-2), and might in some cases even surpass local storage.
+The global storage can be easily mounted in any container on any node as a read-only volume, while you have to write to it using certain rsync commands on the master node. See below for detailed instructions. Every user has their own subdirectory within the global storage (readable by everyone, writeable only by that user). In addition, there is a user-independent directory subtree with common machine learning datasets. If you believe you have a dataset in your own subdirectory which is static and beneficial for many users, please contact me to move it to the common tree.
 == Writing your data to the global storage ==

Difference between revisions of "CCU:Global dataset storage"

Revision as of 07:52, 15 July 2020

Contents

Overview

Writing your data to the global storage

Accessing the global storage from within a container

List of datasets in global storage

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Collective Computational Unit

Mediawiki

Tools

Print/export