Changes

Jump to navigation Jump to search

CCU:GPU Cluster Quick Start

87 bytes added, 11 months ago
Running actual workloads on the cluster
See [https://www.nvidia.com/en-us/gpu-cloud/containers/ the catalog of containers by nVidia] for more options for base images (e.g. [https://ngc.nvidia.com/catalog/containers/nvidia:pytorch PyTorch]), or Google around for containers of your favourite application. '''Make sure you only run containers from trusted sources!'''
'''Please note (very important): The versions 20.09 of the deep learning frameworks on nvcr.io work on all hosts in the cluster. While there are newer images available, they require drivers >= 455, which are not available for all machines yet. So please stick to 20.09 unless you target a very specific host.''' I will soon provide a table with driver versions for all hosts once they are upgraded and moved to the new cluster. As a general rule, everything which is made for Cuda 11.0 and driver version >= 450 should work fine on the Cluster. However, older versions of the images on nvcr.io which run for example CUDA 10.2 still work, if your code requires an older version of CUDA.
At the bottom of the GPU cluster status page, there is the nvidia-smi output for each node, where you can check individual driver and CUDA version. You can again also switch to a shell in the container and verify GPU capabilities:
<syntaxhighlight>
+-------------------------------+----------------------+----------------------+
</syntaxhighlight>
 
To check compabitility with specific nVidia containers, please refer to the [https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html official compatibility matrix]. Note that all nodes have datacenter drivers installed, which should give a large amount of compability. If in doubt, just try it out.
Combine with the volume mounts above, and you already have a working environment. For example, you could transfer some code and data of yours to your home directory, and run it in interactive mode in the container as a quick test. Remember to adjust paths to data sets or to mount the directories in the locations expected by your code.

Navigation menu