Difference between revisions of "Tutorials:Run the example container on the cluster"
Jump to navigation
Jump to search
m (→Checking in on the container) |
(→Checking in on the container) |
||
| Line 38: | Line 38: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
| − | or get some more information about the node the pod was placed on etc. | + | or get some more information about the job, the node the pod was placed on etc. |
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
| + | > kubectl describe job tf-mnist | ||
# replace xxxx with the code from get pods. | # replace xxxx with the code from get pods. | ||
> kubectl describe pod tf-mnist-xxxx | > kubectl describe pod tf-mnist-xxxx | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | |||
| + | |||
| + | You can also open a shell in the running container, just as with docker: | ||
| + | |||
| + | <syntaxhighlight lang="bash"> | ||
| + | > kubectl exec -it tf-mnist-xxxx /bin/bash | ||
| + | root@tf-mnist-xxxxx:/workspace# nvidia-smi | ||
| + | Tue Jun 18 14:25:00 2019 | ||
| + | +-----------------------------------------------------------------------------+ | ||
| + | | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | | ||
| + | |-------------------------------+----------------------+----------------------+ | ||
| + | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
| + | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | ||
| + | |===============================+======================+======================| | ||
| + | | 0 Tesla V100-SXM3... On | 00000000:E7:00.0 Off | 0 | | ||
| + | | N/A 39C P0 68W / 350W | 30924MiB / 32480MiB | 6% Default | | ||
| + | +-------------------------------+----------------------+----------------------+ | ||
| + | |||
| + | +-----------------------------------------------------------------------------+ | ||
| + | | Processes: GPU Memory | | ||
| + | | GPU PID Type Process name Usage | | ||
| + | |=============================================================================| | ||
| + | +-----------------------------------------------------------------------------+ | ||
| + | root@tf-mnist-xxxxx:/workspace# ls /application/ | ||
| + | nn.py run.sh tf-mnist.py | ||
| + | root@tf-mnist-xxxxx:/workspace# | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Revision as of 14:25, 18 June 2019
Requirements
- A working connection and login to the Kubernetes cluster.
- A valid namespace selected with authorization to run pods.
- A test container pushed to the CCU docker registry.
Set up a Kubernetes job script
Download the Kubernetes samples and look at job script in example_1. Alternatively, create your own directory and file named "job_script.yaml". Edit the contents as follows and replace all placeholders with your data:
When we start this job, it will create a single container based on the image we previously uploaded to the registry on a suitable node which serves the selected namespace of the cluster.
> kubectl apply -f job_script.yaml
Checking in on the container
We first check if our container is running.
> kubectl get pods
# somewhere in the output you should see a line like this:
NAME READY STATUS RESTARTS AGE
tf-mnist-xxxx 1/1 Running 0 7s
Now that you now the name of the pod, you can check in on the logs:
# replace xxxx with the code from get pods.
> kubectl logs tf-mnist-xxxx
# this should show the console output of your python program
or get some more information about the job, the node the pod was placed on etc.
> kubectl describe job tf-mnist
# replace xxxx with the code from get pods.
> kubectl describe pod tf-mnist-xxxx
You can also open a shell in the running container, just as with docker:
> kubectl exec -it tf-mnist-xxxx /bin/bash
root@tf-mnist-xxxxx:/workspace# nvidia-smi
Tue Jun 18 14:25:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM3... On | 00000000:E7:00.0 Off | 0 |
| N/A 39C P0 68W / 350W | 30924MiB / 32480MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root@tf-mnist-xxxxx:/workspace# ls /application/
nn.py run.sh tf-mnist.py
root@tf-mnist-xxxxx:/workspace#