Difference between revisions of "Tutorials:Install the nVidia docker system"
m (→Access the containers on the nVidia GPU cloud) |
m (→Install the correct docker version) |
||
| Line 23: | Line 23: | ||
Note that the string "bionic" is the output of the command "lsb_release -cs". If you have another version of Ubuntu than 18.04, you can try to replace "bionic" with the output of this command, but it might not be supported. On a derivative Linux, this does not work, and you need to find out the correct Ubuntu lsb release by consulting their documentation. Do the same for similar occurences in scripts further below. | Note that the string "bionic" is the output of the command "lsb_release -cs". If you have another version of Ubuntu than 18.04, you can try to replace "bionic" with the output of this command, but it might not be supported. On a derivative Linux, this does not work, and you need to find out the correct Ubuntu lsb release by consulting their documentation. Do the same for similar occurences in scripts further below. | ||
| + | |||
| + | You also need a more recent version of a tool called "docker-compose", more on this later. | ||
| + | Install the latest version like this: | ||
| + | |||
| + | <syntaxhighlight lang="bash"> | ||
| + | sudo curl -L "https://github.com/docker/compose/releases/download/1.24.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/bin/docker-compose | ||
| + | sudo chmod +x /usr/local/bin/docker-compose | ||
| + | </syntaxhighlight> | ||
Post install, you have to allow docker access for your account. Note that this comes which a bunch of privileges which are essentially equivalent to admin rights on your machine. Thus, some guides recommend to set up a special user account for this. Anyway, assign the respective rights by adding your user to the docker group: | Post install, you have to allow docker access for your account. Note that this comes which a bunch of privileges which are essentially equivalent to admin rights on your machine. Thus, some guides recommend to set up a special user account for this. Anyway, assign the respective rights by adding your user to the docker group: | ||
| Line 31: | Line 39: | ||
After that, it is at least necessary to log out and back in to update the privileges, but a reboot also can not hurt. | After that, it is at least necessary to log out and back in to update the privileges, but a reboot also can not hurt. | ||
| − | |||
== Installing the nVidia docker extension == | == Installing the nVidia docker extension == | ||
Revision as of 17:48, 18 May 2019
Contents
Overview
In order to run containers which make use of the GPUs on your system, you need to install a specific docker container infrastructure on your system. This guide walks you through the necessary steps.
Install the correct docker version
Unfortunately, the docker version available with the usual package manager is incompatible with the nVidia docker tools. Thus, we have to install a version directly from the repositories. The following script (which needs sudo privileges to run) shows you how.
#!/bin/bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
bionic \
stable"
sudo apt-get update
sudo apt-get install -y docker-ce
Note that the string "bionic" is the output of the command "lsb_release -cs". If you have another version of Ubuntu than 18.04, you can try to replace "bionic" with the output of this command, but it might not be supported. On a derivative Linux, this does not work, and you need to find out the correct Ubuntu lsb release by consulting their documentation. Do the same for similar occurences in scripts further below.
You also need a more recent version of a tool called "docker-compose", more on this later. Install the latest version like this:
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
Post install, you have to allow docker access for your account. Note that this comes which a bunch of privileges which are essentially equivalent to admin rights on your machine. Thus, some guides recommend to set up a special user account for this. Anyway, assign the respective rights by adding your user to the docker group:
sudo usermod -aG docker $USER
After that, it is at least necessary to log out and back in to update the privileges, but a reboot also can not hurt.
Installing the nVidia docker extension
The default docker installation is not able to talk to the nVidia GPUs present in your system. Thus, you have to install an extension by nVidia which allows it to do so. Run the following script:
#!/bin/bash
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
# hard-coded distro ID so that it also works on Ubuntu flavors like Mint
# ubuntu16.04 is also available, maybe some other versions (see above github)
distribution=ubuntu18.04
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
That's it, your system is now configured to run nVidia's base containers for GPU utilization.
To test it and actually run your first container, try out:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-devel nvcc --version
This will pull the docker container with CUDA 9.0 and run the command "nvcc --version" inside it. In effect, you should see a similar output as on your own system, but with a different version of CUDA displayed. You can also try to run nvidia-smi inside the container:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-devel nvidia-smi
This should show a similar output as when you run it directly, i.e. show the same graphics card(s).
Access the containers on the nVidia GPU cloud
nVidia provides many optimized container images for their GPU infrastructure for a variety of tasks (deep learning, high-performance computing, etc.). You should choose these images as the source container images for your applications. To be able to do so, you first need an account at the nVidia GPU cloud. Once you are signed in, click left on configuration and follow the steps to get an API key to the registry. It will be a long string of characters which looks like this:
QWEzamZyNWhhaWZuN2J2aW5hNjBzdmk5N206NzMwMTU5MWMtNzE0My00N2FmLTk4ZTktY2EzZmQyYzgzZDUz
Copy it and place it in a file somewhere safe so that you remember it. You can generate a new one anytime, but then the old one will become invalid.
You can now tell docker to login to the nVidia GPU cloud container registry so that you can pull container images from there. For this, use the shell command
docker login -u '$oauthtoken' --password-stdin nvcr.io <<< ' your API key here between the quotes '
That's it. I suggest you put the above command in a script in $USER/bin, so you can quickly rerun it after a reboot, then you also do not forget your key. Remember to protect your folders from being read by other users if they contain this kind of sensitive information.
You can test whether everything worked by pulling a container from the nVidia cloud and then firing up Python inside the container:
docker run -it --runtime=nvidia nvcr.io/nvidia/tensorflow:18.06-py3 python
The first time will be slow, as it needs to download all the images, after that, they will be in your local storage and start up much faster.
You will enter an interactive Python interpreter which runs inside the container. To test whether GPU acceleration works in Tensorflow, you can issue for example the following commands in the interpreter (enter an empy line after each "with" block and take care to copy the right number of spaces in front of the lines as well):
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
If you don't get any errors and the final output is a matrix like this:
[[ 22. 28.]
[ 49. 64.]]
then everything is fine. When you are done testing, just enter the line "quit()" to exit the Python interpreter, which will also terminate the container.
You are now ready for the next tutorial, which will show you how to use the nVidia GPU cloud images as a basis for your own applications.