Difference between revisions of "Tutorials:container which trains MNIST using Tensorflow"
(Created page with "== Overview == In this example, we study in depth how to create a machine learning container which can be run on the cluster. In principle, this works just like creating any...") |
(No difference)
|
Revision as of 08:03, 19 May 2019
Overview
In this example, we study in depth how to create a machine learning container which can be run on the cluster. In principle, this works just like creating any other docker container. However, from the very beginning, we should write our code so that it fits a few special conventions, in particular about where you read/write your data. While it is in principle possible to map the directories on the cluster node to any directory which is used by your program, it is advised that you stick to a certain structure, in particular if you intend your code to be easily parsed by other people.
We will start simple, and then gradually add more capability for our program:
1. basic example without reading/writing data 2. logging and verifying console output of our program 3. monitoring the training process with Tensorboard 4. writing the trained/intermediate models to persistent storage