Difference between revisions of "Tutorials:Run the example container on the cluster"

From Collective Computational Unit
Jump to navigation Jump to search
m (Checking in on the container)
(Checking in on the container)
Line 38: Line 38:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
or get some more information about the node the pod was placed on etc.
+
or get some more information about the job, the node the pod was placed on etc.
  
 
<syntaxhighlight lang="bash">
 
<syntaxhighlight lang="bash">
 +
> kubectl describe job tf-mnist
 
# replace xxxx with the code from get pods.
 
# replace xxxx with the code from get pods.
 
> kubectl describe pod tf-mnist-xxxx
 
> kubectl describe pod tf-mnist-xxxx
 +
</syntaxhighlight>
 +
 +
 +
 +
You can also open a shell in the running container, just as with docker:
 +
 +
<syntaxhighlight lang="bash">
 +
> kubectl exec -it tf-mnist-xxxx /bin/bash
 +
root@tf-mnist-xxxxx:/workspace# nvidia-smi
 +
Tue Jun 18 14:25:00 2019     
 +
+-----------------------------------------------------------------------------+
 +
| NVIDIA-SMI 418.67      Driver Version: 418.67      CUDA Version: 10.1    |
 +
|-------------------------------+----------------------+----------------------+
 +
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 +
| Fan  Temp  Perf  Pwr:Usage/Cap|        Memory-Usage | GPU-Util  Compute M. |
 +
|===============================+======================+======================|
 +
|  0  Tesla V100-SXM3...  On  | 00000000:E7:00.0 Off |                    0 |
 +
| N/A  39C    P0    68W / 350W |  30924MiB / 32480MiB |      6%      Default |
 +
+-------------------------------+----------------------+----------------------+
 +
                                                                             
 +
+-----------------------------------------------------------------------------+
 +
| Processes:                                                      GPU Memory |
 +
|  GPU      PID  Type  Process name                            Usage      |
 +
|=============================================================================|
 +
+-----------------------------------------------------------------------------+
 +
root@tf-mnist-xxxxx:/workspace# ls /application/
 +
nn.py  run.sh  tf-mnist.py
 +
root@tf-mnist-xxxxx:/workspace#
 
</syntaxhighlight>
 
</syntaxhighlight>

Revision as of 14:25, 18 June 2019

Requirements

  • A working connection and login to the Kubernetes cluster.
  • A valid namespace selected with authorization to run pods.
  • A test container pushed to the CCU docker registry.


Set up a Kubernetes job script

Download the Kubernetes samples and look at job script in example_1. Alternatively, create your own directory and file named "job_script.yaml". Edit the contents as follows and replace all placeholders with your data:

When we start this job, it will create a single container based on the image we previously uploaded to the registry on a suitable node which serves the selected namespace of the cluster.

> kubectl apply -f job_script.yaml

Checking in on the container

We first check if our container is running.

> kubectl get pods
# somewhere in the output you should see a line like this:
NAME             READY   STATUS    RESTARTS   AGE
tf-mnist-xxxx   1/1     Running   0          7s

Now that you now the name of the pod, you can check in on the logs:

# replace xxxx with the code from get pods.
> kubectl logs tf-mnist-xxxx
# this should show the console output of your python program

or get some more information about the job, the node the pod was placed on etc.

> kubectl describe job tf-mnist
# replace xxxx with the code from get pods.
> kubectl describe pod tf-mnist-xxxx


You can also open a shell in the running container, just as with docker:

> kubectl exec -it tf-mnist-xxxx /bin/bash
root@tf-mnist-xxxxx:/workspace# nvidia-smi
Tue Jun 18 14:25:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM3...  On   | 00000000:E7:00.0 Off |                    0 |
| N/A   39C    P0    68W / 350W |  30924MiB / 32480MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@tf-mnist-xxxxx:/workspace# ls /application/
nn.py  run.sh  tf-mnist.py
root@tf-mnist-xxxxx:/workspace#