Difference between revisions of "Tutorials:Run the example container on the cluster"

Revision as of 14:25, 18 June 2019

Requirements

A working connection and login to the Kubernetes cluster.
A valid namespace selected with authorization to run pods.
A test container pushed to the CCU docker registry.

Set up a Kubernetes job script

Download the Kubernetes samples and look at job script in example_1. Alternatively, create your own directory and file named "job_script.yaml". Edit the contents as follows and replace all placeholders with your data:

When we start this job, it will create a single container based on the image we previously uploaded to the registry on a suitable node which serves the selected namespace of the cluster.

> kubectl apply -f job_script.yaml

Checking in on the container

We first check if our container is running.

> kubectl get pods
# somewhere in the output you should see a line like this:
NAME             READY   STATUS    RESTARTS   AGE
tf-mnist-xxxx   1/1     Running   0          7s

Now that you now the name of the pod, you can check in on the logs:

# replace xxxx with the code from get pods.
> kubectl logs tf-mnist-xxxx
# this should show the console output of your python program

or get some more information about the job, the node the pod was placed on etc.

> kubectl describe job tf-mnist
# replace xxxx with the code from get pods.
> kubectl describe pod tf-mnist-xxxx

You can also open a shell in the running container, just as with docker:

> kubectl exec -it tf-mnist-xxxx /bin/bash
root@tf-mnist-xxxxx:/workspace# nvidia-smi
Tue Jun 18 14:25:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM3...  On   | 00000000:E7:00.0 Off |                    0 |
| N/A   39C    P0    68W / 350W |  30924MiB / 32480MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@tf-mnist-xxxxx:/workspace# ls /application/
nn.py  run.sh  tf-mnist.py
root@tf-mnist-xxxxx:/workspace#

Difference between revisions of "Tutorials:Run the example container on the cluster"

Revision as of 14:25, 18 June 2019

Requirements

Set up a Kubernetes job script

Checking in on the container

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Collective Computational Unit

Mediawiki

Tools

Print/export