Changes

Jump to navigation Jump to search

CCU:New GPU Cluster

765 bytes added, 4 years ago
m
Moving your workloads to the new cluster
=== Moving your workloads to the new cluster ===
You can now verify that you can start a GPU-enabled pod. Try to create a pod with the following specs to allocate 1 GPU for you somewhere on the cluster. The pod comes with an immediately usable installation of Tensorflow 2.0. Note that defining resource requests and limits is now mandatory.
<syntaxhighlight>
containers:
- name: gpu-container
image: dockernvcr.io/nvidia/cudatensorflow:1120.012-basetf2-py3
command: ["sleep", "1d"]
resources:
cpu: 1
nvidia.com/gpu: 1
memory: 100Mi2Gi
limits:
cpu: 1
nvidia.com/gpu: 1
memory: 1Gi2Gi
</syntaxhighlight>
Combine with the volume mounts above, and you already have a working environment. For example, you could transfer some code and data of yours to your home directory, and run it in interactive mode in the container as a quick test. Remember to adjust paths to data sets or to mount the directories in the locations expected by your code. <syntaxhighlight>> kubectl exec -it gpu-pod /bin/bash# cd /abyss/home/<your-code-repo># python ./main.py</syntaxhighlight> Note that there are timeouts in place and an interactive session does not last forever, so it is better to build a custom run script which is executed when the container in the pod starts. See the Kubernetes documentation on pods and jobs for more details. TODO: link to respective doc. If you do not have your code ready, you can do a quick test by installing a standard demonstration from [https://github.com/dragen1860/TensorFlow-2.x-Tutorials this tutorial] as follows: <syntaxhighlight>> kubectl exec -it gpu-pod /bin/bash# cd /abyss/home# git clone https://github.com/dragen1860/TensorFlow-2.x-Tutorials.git# cd </syntaxhighlight>
=== Cleaning up ===

Navigation menu