Difference between revisions of "Cluster:Compute nodes"
m (→Requiring a certain label on the node) |
m |
||
| (38 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
__TOC__ | __TOC__ | ||
| + | |||
| + | == List of compute nodes == | ||
| + | |||
| + | '''NOTE: Imp and Dretch do not have an infiniband connection, so ceph filesystem access is slightly slower. Using the local raid for caching data is recommended. | ||
| + | Both machines (Imp in particular) have much less powerful GPUs than the rest of the cluster, so these two systems are ideal for testing and experimenting. | ||
| + | ''' | ||
| + | |||
| + | |||
| + | The following GPU nodes are currently part of the cluster. There are more nodes which act as API servers or provide the Ceph filesystem and web services, but these are not available for standard users. | ||
| + | |||
| + | Note: Labels / Taints in this table might be outdated, use "kubectl describe node <name>" for up-to-date information. | ||
| + | |||
| + | {| class="wikitable" | ||
| + | |- | ||
| + | ! scope="col"| CCU name | ||
| + | ! scope="col"| Access | ||
| + | ! scope="col"| Platform | ||
| + | ! scope="col"| GPUs | ||
| + | ! scope="col"| Labels | ||
| + | ! scope="col"| Taints | ||
| + | |- | ||
| + | ! scope="row"| imp | ||
| + | | all | ||
| + | | Dual Xeon Rack | ||
| + | | 4 x Titan Xp @ 12 GB | ||
| + | | gpumem=12, gpuarch=nvidia-titan, nvidia-compute-capability-sm70=true | ||
| + | | | ||
| + | |- | ||
| + | ! scope="row"| dretch | ||
| + | | all | ||
| + | | Dual Xeon Rack | ||
| + | | 4 x Titan RTX @ 24 GB | ||
| + | | gpumem=24, gpuarch=nvidia-titan, nvidia-compute-capability-sm70=true | ||
| + | | | ||
| + | |- | ||
| + | ! scope="row"| belial | ||
| + | | exc-cb | ||
| + | | Supermicro | ||
| + | | 8 x Quadro RTX 6000 @ 24 GB | ||
| + | | gpumem=24, gpuarch=nvidia-rtx, nvidia-compute-capability-sm75=true | ||
| + | | gpumem=24:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| fierna | ||
| + | | exc-cb | ||
| + | | Supermicro | ||
| + | | 8 x Quadro RTX 6000 @ 24 GB | ||
| + | | gpumem=24, gpuarch=nvidia-rtx, nvidia-compute-capability-sm75=true | ||
| + | | gpumem=24:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| vecna | ||
| + | | exc-cb, inf | ||
| + | | nVidia DGX-2 | ||
| + | | 16 x V100 @ 32 GB | ||
| + | | gpumem=32, gpuarch=nvidia-v100, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=32:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| zariel | ||
| + | | trr161 | ||
| + | | nVidia DGX A100 | ||
| + | | 8 x A100 @ 40 GB | ||
| + | | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=40:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| tiamat | ||
| + | | exc-cb | ||
| + | | Supermicro | ||
| + | | 4 x A100 @ 40 GB | ||
| + | | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=40:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| asmodeus | ||
| + | | all | ||
| + | | Supermicro | ||
| + | | 4 x A100 HGX 320 GB, subdivided in 8 GPUs @ 40 GB | ||
| + | | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=40:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| demogorgon | ||
| + | | exc-cb | ||
| + | | Delta | ||
| + | | 8 x A40 @ 48 GB | ||
| + | | gpumem=48, gpuarch=nvidia-a40, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=48:NoSchedule | ||
| + | |- | ||
| + | ! scope="row"| kiaransalee | ||
| + | | seds | ||
| + | | Delta | ||
| + | | 8 x H100 HGX 640 GB | ||
| + | | gpumem=80, gpuarch=nvidia-h100, nvidia-compute-capability-sm80=true | ||
| + | | gpumem=80:NoSchedule | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | The CCU name is the internal name used in the Kubernetes cluster, as well as the configured hostname of the node. Nodes are not accessible from the outside world, you have to access the cluster via kubectl through the API-server. | ||
| + | |||
| + | In the column "Access" you can find which Kubernetes user groups is allowed to access this node. Please only target a specific node if you are allowed to. | ||
| + | |||
| + | {| class="wikitable" | ||
| + | |- | ||
| + | ! scope="col"| Group | ||
| + | ! scope="col"| Desciption | ||
| + | |- | ||
| + | ! scope="row"| exc-cb | ||
| + | | Centre for the Advanced Study of Collective Behaviour | ||
| + | |- | ||
| + | ! scope="row"| trr161 | ||
| + | | SFB Transregio 161 "Quantitative Methods for Visual Computing" | ||
| + | |- | ||
| + | ! scope="row"| inf | ||
| + | | Department of Computer Science | ||
| + | |- | ||
| + | ! scope="row"| seds | ||
| + | | Social and Economic Data Sciences | ||
| + | |- | ||
| + | ! scope="row"| cvia | ||
| + | | Computer Vision and Image Analysis Group | ||
| + | |- | ||
| + | |} | ||
== Targeting a specific node == | == Targeting a specific node == | ||
Targeting a specific node can be done in two different ways, either selecting a node name directly, or requiring certain labels on the node. | Targeting a specific node can be done in two different ways, either selecting a node name directly, or requiring certain labels on the node. | ||
| − | See table | + | See table above for node names and associated labels. |
| + | See the [https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ Kubernetes API documentation on how to assign pods to nodes], or refer to the following examples, which are probably self-explaining. | ||
=== Selecting a node name === | === Selecting a node name === | ||
| − | Example: GPU-enabled pod which runs only on the node "belial" | + | Example: GPU-enabled pod which runs only on the node "belial". Note that Belial is a more powerful system, so it is protected by a taint, see table above. Thus, you also have to tolerate the respective taint so that the pod can actually be scheduled on Belial, which is explained below. |
<syntaxhighlight> | <syntaxhighlight> | ||
| Line 34: | Line 154: | ||
# more specs (volumes etc.) | # more specs (volumes etc.) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | |||
=== Requiring a certain label on the node === | === Requiring a certain label on the node === | ||
| Line 66: | Line 185: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | == | + | == Targeting more powerful GPUs == |
| − | By default, Kubernetes schedules GPU pods only on the smallest class of GPU | + | By default, Kubernetes schedules GPU pods only on the smallest class of GPU (nVidia Titan). The way how this is achieved is that nodes with higher grade GPUs are assigned a "node taint", which makes the node only available to pods which specify that they are "tolerant" against the taint. |
So if your tasks for example requires a GPU with *exactly* 32 GB, you have to | So if your tasks for example requires a GPU with *exactly* 32 GB, you have to | ||
| − | # make the pod tolerate the taint "gpumem | + | # make the pod tolerate the taint "gpumem=32:NoSchedule" (see table below). |
| − | # make the pod require the node label "gpumem | + | # make the pod require the node label "gpumem" to be exactly 32. |
| − | + | See the [https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ Kubernetes API documentation on taints and tolerations] for more details. | |
| − | |||
Example: | Example: | ||
| + | <syntaxhighlight> | ||
| + | apiVersion: v1 | ||
| + | kind: Pod | ||
| + | metadata: | ||
| + | name: gpu-pod | ||
| + | spec: | ||
| + | nodeSelector: | ||
| + | gpumem: "32" | ||
| + | tolerations: | ||
| + | - key: "gpumem" | ||
| + | # Note: to be able to run on a GPU with any amount of memory, | ||
| + | # replace the operator/value pair by just 'operator: "Exists"'. | ||
| + | operator: "Equal" | ||
| + | value: "32" | ||
| + | effect: "NoSchedule" | ||
| + | containers: | ||
| + | - name: gpu-container | ||
| + | image: nvcr.io/nvidia/tensorflow:20.09-tf2-py3 | ||
| + | command: ["sleep", "1d"] | ||
| + | resources: | ||
| + | requests: | ||
| + | cpu: 1 | ||
| + | nvidia.com/gpu: 1 | ||
| + | memory: 10Gi | ||
| + | limits: | ||
| + | cpu: 1 | ||
| + | nvidia.com/gpu: 1 | ||
| + | memory: 10Gi | ||
| + | # more specs (volumes etc.) | ||
| + | </syntaxhighlight> | ||
| − | If you need a GPU with *at least* 32 GB, but also would be happy with | + | If you need a GPU with *at least* 32 GB, but also would be happy with more, you just can tolerate any amount. Then, |
| − | + | make the pod require the node label "gpumem" to be larger than 31. | |
| − | |||
| − | |||
| + | Note: typically, you should *not* do this and reserve a GPU which has just enough memory. However, if e.g. all 32 GB GPUs are busy already, you can move up to a 40 GB GPU. | ||
Example: | Example: | ||
| − | + | <syntaxhighlight> | |
| − | + | apiVersion: v1 | |
| − | + | kind: Pod | |
| − | + | metadata: | |
| − | + | name: gpu-pod | |
| − | + | spec: | |
| − | + | # the standard node selector is insufficient here. | |
| − | + | # needs to use the more expressive "nodeAffinity". | |
| − | + | affinity: | |
| − | + | nodeAffinity: | |
| − | + | requiredDuringSchedulingIgnoredDuringExecution: | |
| − | + | nodeSelectorTerms: | |
| − | + | - matchExpressions: | |
| − | + | - key: gpumem | |
| − | + | operator: Gt | |
| − | + | value: 31 | |
| − | + | # note: this also works to specify a minimum compute capability | |
| − | + | - matchExpressions: | |
| − | + | - key: nvidia-compute-capability-sm | |
| − | + | operator: Gt | |
| − | + | value: 79 | |
| − | + | tolerations: | |
| − | + | - key: "gpumem" | |
| − | + | operator: "Exists" | |
| − | + | effect: "NoSchedule" | |
| − | + | # ... rest of the specs like before | |
| − | + | </syntaxhighlight> | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
Latest revision as of 13:26, 15 June 2024
Contents
List of compute nodes
NOTE: Imp and Dretch do not have an infiniband connection, so ceph filesystem access is slightly slower. Using the local raid for caching data is recommended. Both machines (Imp in particular) have much less powerful GPUs than the rest of the cluster, so these two systems are ideal for testing and experimenting.
The following GPU nodes are currently part of the cluster. There are more nodes which act as API servers or provide the Ceph filesystem and web services, but these are not available for standard users.
Note: Labels / Taints in this table might be outdated, use "kubectl describe node <name>" for up-to-date information.
| CCU name | Access | Platform | GPUs | Labels | Taints |
|---|---|---|---|---|---|
| imp | all | Dual Xeon Rack | 4 x Titan Xp @ 12 GB | gpumem=12, gpuarch=nvidia-titan, nvidia-compute-capability-sm70=true | |
| dretch | all | Dual Xeon Rack | 4 x Titan RTX @ 24 GB | gpumem=24, gpuarch=nvidia-titan, nvidia-compute-capability-sm70=true | |
| belial | exc-cb | Supermicro | 8 x Quadro RTX 6000 @ 24 GB | gpumem=24, gpuarch=nvidia-rtx, nvidia-compute-capability-sm75=true | gpumem=24:NoSchedule |
| fierna | exc-cb | Supermicro | 8 x Quadro RTX 6000 @ 24 GB | gpumem=24, gpuarch=nvidia-rtx, nvidia-compute-capability-sm75=true | gpumem=24:NoSchedule |
| vecna | exc-cb, inf | nVidia DGX-2 | 16 x V100 @ 32 GB | gpumem=32, gpuarch=nvidia-v100, nvidia-compute-capability-sm80=true | gpumem=32:NoSchedule |
| zariel | trr161 | nVidia DGX A100 | 8 x A100 @ 40 GB | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | gpumem=40:NoSchedule |
| tiamat | exc-cb | Supermicro | 4 x A100 @ 40 GB | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | gpumem=40:NoSchedule |
| asmodeus | all | Supermicro | 4 x A100 HGX 320 GB, subdivided in 8 GPUs @ 40 GB | gpumem=40, gpuarch=nvidia-a100, nvidia-compute-capability-sm80=true | gpumem=40:NoSchedule |
| demogorgon | exc-cb | Delta | 8 x A40 @ 48 GB | gpumem=48, gpuarch=nvidia-a40, nvidia-compute-capability-sm80=true | gpumem=48:NoSchedule |
| kiaransalee | seds | Delta | 8 x H100 HGX 640 GB | gpumem=80, gpuarch=nvidia-h100, nvidia-compute-capability-sm80=true | gpumem=80:NoSchedule |
The CCU name is the internal name used in the Kubernetes cluster, as well as the configured hostname of the node. Nodes are not accessible from the outside world, you have to access the cluster via kubectl through the API-server.
In the column "Access" you can find which Kubernetes user groups is allowed to access this node. Please only target a specific node if you are allowed to.
| Group | Desciption |
|---|---|
| exc-cb | Centre for the Advanced Study of Collective Behaviour |
| trr161 | SFB Transregio 161 "Quantitative Methods for Visual Computing" |
| inf | Department of Computer Science |
| seds | Social and Economic Data Sciences |
| cvia | Computer Vision and Image Analysis Group |
Targeting a specific node
Targeting a specific node can be done in two different ways, either selecting a node name directly, or requiring certain labels on the node. See table above for node names and associated labels. See the Kubernetes API documentation on how to assign pods to nodes, or refer to the following examples, which are probably self-explaining.
Selecting a node name
Example: GPU-enabled pod which runs only on the node "belial". Note that Belial is a more powerful system, so it is protected by a taint, see table above. Thus, you also have to tolerate the respective taint so that the pod can actually be scheduled on Belial, which is explained below.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
nodeSelector:
kubernetes.io/hostname: belial
containers:
- name: gpu-container
image: nvcr.io/nvidia/tensorflow:20.09-tf2-py3
command: ["sleep", "1d"]
resources:
requests:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
limits:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
# more specs (volumes etc.)Requiring a certain label on the node
Example: GPU-enabled pod which requires compute capability of at least sm-75:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
nodeSelector:
compute-capability-atleast-sm75: true
# note: if a node has e.g. the label "compute-capability-sm80", it also has the
# corresponding "atleast"-label for all lower or equal compute capabilities. Same holds for "gpumem".
containers:
- name: gpu-container
image: nvcr.io/nvidia/tensorflow:20.09-tf2-py3
command: ["sleep", "1d"]
resources:
requests:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
limits:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
# more specs (volumes etc.)Targeting more powerful GPUs
By default, Kubernetes schedules GPU pods only on the smallest class of GPU (nVidia Titan). The way how this is achieved is that nodes with higher grade GPUs are assigned a "node taint", which makes the node only available to pods which specify that they are "tolerant" against the taint.
So if your tasks for example requires a GPU with *exactly* 32 GB, you have to
- make the pod tolerate the taint "gpumem=32:NoSchedule" (see table below).
- make the pod require the node label "gpumem" to be exactly 32.
See the Kubernetes API documentation on taints and tolerations for more details.
Example:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
nodeSelector:
gpumem: "32"
tolerations:
- key: "gpumem"
# Note: to be able to run on a GPU with any amount of memory,
# replace the operator/value pair by just 'operator: "Exists"'.
operator: "Equal"
value: "32"
effect: "NoSchedule"
containers:
- name: gpu-container
image: nvcr.io/nvidia/tensorflow:20.09-tf2-py3
command: ["sleep", "1d"]
resources:
requests:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
limits:
cpu: 1
nvidia.com/gpu: 1
memory: 10Gi
# more specs (volumes etc.)
If you need a GPU with *at least* 32 GB, but also would be happy with more, you just can tolerate any amount. Then,
make the pod require the node label "gpumem" to be larger than 31.
Note: typically, you should *not* do this and reserve a GPU which has just enough memory. However, if e.g. all 32 GB GPUs are busy already, you can move up to a 40 GB GPU.
Example:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
# the standard node selector is insufficient here.
# needs to use the more expressive "nodeAffinity".
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpumem
operator: Gt
value: 31
# note: this also works to specify a minimum compute capability
- matchExpressions:
- key: nvidia-compute-capability-sm
operator: Gt
value: 79
tolerations:
- key: "gpumem"
operator: "Exists"
effect: "NoSchedule"
# ... rest of the specs like before