Cluster:Compute nodes

From Collective Computational Unit
Revision as of 18:16, 27 November 2021 by Bastian.goldluecke (talk | contribs)
Jump to navigation Jump to search

Targeting a specific node

Targeting a specific node can be done in two different ways:

1. Selecting a node name. 2. Requiring a certain label on the node,

See table below for node names and associated labels.

Example 1: GPU-enabled pod which runs only on the node "belial":


Example 2: GPU-enabled pod which requires compute capability of at least sm-60:






Acquiring GPUs with more than 20 GB

By default, Kubernetes schedules GPU pods only on the smallest class of GPU with 20 GB of memory. The way how this is achieved is that nodes with higher grade GPUs are assigned a "node taint", which makes the node only available to pods which specify that they are "tolerant" against the taint.

So if your tasks for example requires a GPU with at least 32 GB, you have to

1. make the pod tolerate the taint "gpumem-32" (see table below). 2. make the pod require the node label "gpumem-32".

Example:



List of compute nodes

The following nodes are currently part of the cluster. Note that the master node is CPU only and not used for computations, as it hosts all CCU infrastructure (among a few other things).

CCU name Access Platform GPUs Labels Taints
Vecna exc-cb, inf nVidia DGX-2 16 x V100 @ 32 GB gpumem-32, nvidia-v100, nvidia-compute-capability-sm80 gpumem-32
Glasya trr161 Dual Xeon Rack 4 x Titan RTX @ 24 GB gpumem-24, nvidia-rtx, nvidia-compute-capability-sm80 gpumem-24
Belial exc-cb Supermicro 8 x Quadro RTX 6000 @ 24 GB gpumem-24, nvidia-rtx, nvidia-compute-capability-sm75 gpumem-24
Fierna exc-cb Supermicro 8 x Quadro RTX 6000 @ 24 GB gpumem-24, nvidia-rtx, nvidia-compute-capability-sm75 gpumem-24
Zariel trr161 nVidia DGX A100 8 x A100 @ 40 GB gpumem-40, nvidia-a100, nvidia-compute-capability-sm80 gpumem-40
Tiamat exc-cb Supermicro 4 x A100 @ 40 GB gpumem-40, nvidia-a100, nvidia-compute-capability-sm80 gpumem-40
Asmodeus cvia Supermicro 4 x A100 HGX 320 GB, subdivided in 16 GPUs @ 20 GB gpumem-20, nvidia-a100, nvidia-compute-capability-sm80
Demogorgon exc-cb Delta 8 x A40 @ 40 GB gpumem-40, nvidia-a40, nvidia-compute-capability-sm80 gpumem-40


The CCU name is the internal name used in the Kubernetes cluster, as well as the configured hostname of the node. Nodes are not accessible from the outside world, you have to access the cluster via kubectl through the API-server.

In the column "Access" you can find which Kubernetes user groups can access this node.

Group Desciption
exc-cb Centre for the Advanced Study of Collective Behaviour
trr161 SFB Transregio 161 "Quantitative Methods for Visual Computing"
inf Department of Computer Science
cvia Computer Vision and Image Analysis Group