Changes

Jump to navigation Jump to search

CCU:New GPU Cluster

2,065 bytes added, 4 years ago
m
Moving your workloads to the new cluster
=== Moving your workloads to the new cluster ===
You can now verify that you can start a GPU-enabled pod. Try to create a pod with the following specs to allocate 1 GPU for you somewhere on the cluster.
<syntaxhighlight>
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: gpu-container
image: docker.io/nvidia/cuda:11.0-base
command: ["sleep", "1d"]
resources:
requests:
cpu: 1
nvidia.com/gpu: 1
memory: 100Mi
limits:
cpu: 1
nvidia.com/gpu: 1
memory: 1Gi
</syntaxhighlight>
You can again switch to a shell in the container and verify GPU capabilities:
 
<syntaxhighlight>
> kubectl exec -it gpu-pod /bin/bash
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB Off | 00000000:C1:00.0 Off | 0 |
| N/A 27C P0 51W / 400W | 4MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
</syntaxhighlight>
 
 
Combine with the volume mounts above, and you already have a working environment. For example, you could transfer some code and data of yours to your home directory, and run it in interactive mode in the container as a quick test. Note that there are timeouts in place and an interactive session does not last forever, so it is better to build a custom run script which is executed when the container in the pod starts. See the documentation for more details. TODO: link to respective doc.
=== Cleaning up ===

Navigation menu