Difference between revisions of "CCU:GPU Cluster"

From Collective Computational Unit
Jump to navigation Jump to search
m (How to get started)
m (Overview)
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
The CCU provides access to state-of-the-art infrastructure for GPU-based machine learning frameworks based on nVidia GPUs.
+
The CCU provides access to state-of-the-art hardware infrastructure to run GPU-accelerated machine learning frameworks.
 
This page gives a general overview and links to more in-depth tutorials on how to work with the cluster.
 
This page gives a general overview and links to more in-depth tutorials on how to work with the cluster.
 
There is some overhead involved when writing code for your projects and you have to stick to a few guidelines,
 
There is some overhead involved when writing code for your projects and you have to stick to a few guidelines,
Line 10: Line 10:
  
 
This means for you that you have to be able to take the necessary steps to wrap your own code into a container. All this is covered in an easy, introductory way in the short tutorials below, which should be sufficient to get you started. At some point, you might want to learn about docker in a more in-depth manner, for this, I refer you to the excellent tutorials available elsewhere, some of which are linked [[CCU:tutorials|here]].
 
This means for you that you have to be able to take the necessary steps to wrap your own code into a container. All this is covered in an easy, introductory way in the short tutorials below, which should be sufficient to get you started. At some point, you might want to learn about docker in a more in-depth manner, for this, I refer you to the excellent tutorials available elsewhere, some of which are linked [[CCU:tutorials|here]].
 
 
  
 
== What you need ==
 
== What you need ==

Revision as of 16:56, 18 May 2019

Overview

The CCU provides access to state-of-the-art hardware infrastructure to run GPU-accelerated machine learning frameworks. This page gives a general overview and links to more in-depth tutorials on how to work with the cluster. There is some overhead involved when writing code for your projects and you have to stick to a few guidelines, but there are template projects and scripts provided so that you can get started with minimal knowledge about the technical background of the GPU cluster.

The GPU cluster is based on Kubernetes, which is a framework to deploy so-called Docker containers to different compute nodes. You can think of a Docker container as a wrapper for your machine learning application, which contains all necessary code and all the libraries it depends on (yes, also the ones from the basic OS). In essence, it is an independent object which can be deployed and run on an arbitrary computer on which the docker infrastructure is installed.

This means for you that you have to be able to take the necessary steps to wrap your own code into a container. All this is covered in an easy, introductory way in the short tutorials below, which should be sufficient to get you started. At some point, you might want to learn about docker in a more in-depth manner, for this, I refer you to the excellent tutorials available elsewhere, some of which are linked here.

What you need

  • An account for the CCU.
  • Ideally, a desktop PC with an nVidia GPU to test your code before pushing it to the cluster (debugging can otherwise be hard).
  • Your PC ideally runs a flavor of Linux, all example scripts were tested against Ubuntu 18.04 (should also work on derivatives, such as Mint 19).
  • Admin access to your own PC to install lots of stuff (or a friendly administrator).
  • More specific needs will be covered in the in-depth tutorials.


How to get started

  • Learning the basics of Docker
    • Step 1: an in-depth look at the example container which trains MNIST using Tensorflow.
    • Step 2: adapt the example to your own project.
    • Step 3: run and test the container locally.
    • Step 4: push the container to the registry server of the cluster.
  • Learning the basics of Kubernetes and how to run jobs on the cluster:
    • Step 1: install the Kubernetes infrastructure and set up your user account
    • Step 2: run the example container on the cluster and make sure that it works correctly.
    • Step 3: run your own container on the cluster.