Unfortunately, the docker version available with the usual package manager is incompatible with the nVidia docker tools. Thus, we have to install a version directly from the repositories. The following script (which needs sudo privileges to run) shows you how.
NOTE: The following scripts require <code>curl</code>, which may be not yet installed on your system (especially on a fresh Ubuntu). If so, install with <code>sudo apt-get install curl</code>.
<syntaxhighlight lang="bash">
Note that the string "bionic" is the output of the command "lsb_release -cs". If you have another version of Ubuntu than 18.04, you can try to replace "bionic" with the output of this command, but it might not be supported. On a derivative Linux, this does not work, and you need to find out the correct Ubuntu lsb release by consulting their documentation. Do the same for similar occurences in scripts further below.
You also need a more recent version of a tool called "docker-compose", more on this later.Install the latest version like this: <syntaxhighlight lang="bash">sudo curl -L "https://github.com/docker/compose/releases/download/1.24.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/bin/docker-composesudo chmod +x /usr/bin/docker-compose</syntaxhighlight> Post install, you have to allow docker access for your account. Note that this comes which with a bunch of privileges which are essentially equivalent to admin rights on your machine. Thus, some guides recommend to set up a special user account for this. Anyway, assign the respective rights by adding your user to the docker group:
<syntaxhighlight lang="bash">
After that, it is at least necessary to log out and back in to update the privileges, but a reboot also can not hurt.
== Installing the nVidia docker extension ==
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerdsystemctl restart docker
</syntaxhighlight>
<syntaxhighlight lang="bash">
docker run --runtime=nvidia --rm nvidia/cuda:9.0-devel nvcc --version
</syntaxhighlight>
<syntaxhighlight lang="bash">
docker run --runtime=nvidia --rm nvidia/cuda:9.0-devel nvidia-smi
</syntaxhighlight>
This should show a similar output as when you run it directly, i.e. show the same graphics card(s).
== Access the containers on the nVidia GPU cloud ==
nVidia provides many optimized container images for their GPU infrastructure for a variety of tasks (deep learning, high-performance computing, etc.).
You should choose these images as the source container images for your applications.
To be able to do so, you first need an account at the [https://ngc.nvidia.com/catalog/landing nVidia GPU cloud].
Once you are signed in, click left on configuration and follow the steps to get an API key to the registry.
It will be a long string of characters which looks like this:
<syntaxhighlight lang="bash">
QWEzamZyNWhhaWZuN2J2aW5hNjBzdmk5N206NzMwMTU5MWMtNzE0My00N2FmLTk4ZTktY2EzZmQyYzgzZDUz
</syntaxhighlight>
Copy it and place it in a file somewhere safe so that you remember it. You can generate a new one anytime, but then the old one will become invalid.
You can now tell docker to login to the nVidia GPU cloud container registry so that you can pull container images from there.
For this, use the shell command
<syntaxhighlight lang="bash">
docker login -u '$oauthtoken' --password-stdin nvcr.io <<< ' your API key here between the quotes '
</syntaxhighlight>
That's it. I suggest you put the above command in a script in $USER/bin, so you can quickly rerun it after a reboot, then you also do not forget your key. Remember to protect your folders from being read by other users if they contain this kind of sensitive information.
You can test whether everything worked by pulling a container from the nVidia cloud and then firing up Python inside the container:
<syntaxhighlight lang="bash">
docker run -it --runtime=nvidia nvcr.io/nvidia/tensorflow:18.06-py3 python
</syntaxhighlight>
The first time will be slow, as it needs to download all the images, after that, they will be in your local storage and start up much faster.
You will enter an interactive Python interpreter which runs inside the container. To test whether GPU acceleration works in Tensorflow, you can issue for example the following commands in the interpreter (enter an empy line after each "with" block and take care to copy the right number of spaces in front of the lines as well):
<syntaxhighlight lang="bash">
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
</syntaxhighlight>
If you don't get any errors and the final output is a matrix like this:
<syntaxhighlight lang="bash">
[[ 22. 28.]
[ 49. 64.]]
</syntaxhighlight>
then everything is fine. When you are done testing, just enter the line "quit()" to exit the Python interpreter, which will also terminate the container.
== Making nvidia-docker the default runtime (optional) ==
You will note that you always had to select "--runtime=nvidia" for every docker command which runs a GPU container. This is fine in principle, but if you do it on a regular basis (or if you want to set up your own kubernetes minikube for testing), you might wish to make this the default.
For this, edit /etc/docker/daemon.json to look as follows (it will also configure a larger amount of memory, which is recommended for tensorflow):
<syntaxhighlight lang="json">
{
"default-runtime": "nvidia",
"default-shm-size": "1g",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
</syntaxhighlight>
After you did this, you need to restart your docker daemon for the changes to go into effect:
<syntaxhighlight lang="bash">
sudo systemctl restart docker
</syntaxhighlight>
Note that it remembers all running containers and will restart them once it is back up.
You are now ready for the next tutorial, which will show you how to use the nVidia GPU cloud images as a basis for your own applications.
[[Category:Tutorials]]