Changes

CCU:Perstistent storage on the Kubernetes cluster

73 bytes added, 4 years ago

m

→‎The CephFS file system

As explained in the [[CCU:GPU Cluster Quick Start|quick start tutorial]], every user can mount certain local host paths inside their pods, which refer to a global distributed Ceph file system.

This file system is usually very fast, but only if it is used for workloads it is designed for. Remember that it is a distributed storage, this means that metadata access (such as file attributes, or on which server to look for a specific file) is over a database and can be a bottleneck. In effect, performance breaks down dramatically if writing or accessing many small files, or having many small files in a single directory (which forces metadata to be stored on a single server).

'''TL;DR, and this is very important: when using CephFS, make sure to organize your dataset in few large files (e.g. HDF5), and not many small ones !'''

If this is not possible for you, then you need to resort to persistent volumes residing on local storage on a single node, which for small files is orders of magnitude faster, but you are bound to a particular node (or have to duplicate the data in different local filesystems). A tutorial follows.

== Local storage on the node ==

Bastian.goldluecke

ccu, Administrators

684

edits

Changes

CCU:Perstistent storage on the Kubernetes cluster

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Collective Computational Unit

Mediawiki

Tools