Training your model

Starting the training
Once your model is trained

Starting the training

Note for MPhil students: You do not have access to the network and multi-cpu options.

If you have read what is called here a model, you might guess that this step is more about training many models than training "a" model. To avoid confusion here, we will call each individual model a submodel.

When you press "Train", you should see the following screen:

Necessary settings:

You need to set a maximum number of hidden units and a seed number to train your model. The values by default are 20 hidden units and 5 seeds.
The functions of the sliding bars should be obvious, but the meaning of what is being set might not be so:

The number of hidden units is explained here.
What is important here is that you realise that the time required to train a submodel grows exponentially with its number of hidden units (typically, training a model with 1 hidden units can take a few seconds, while many hours can be required for 20 hidden units).
The number of seeds determines the number of different initial weights distribution that will be tried. For each different seed, all the submodels are trained that contain 1 to the maximum hidden units number you have set.

Optional settings:

CPU options: the Model Manager allow you to make full use of multiprocessor machines, with up to 4 CPUs. When you use a multiprocessor machine, the load is distributed as evenly as possible between the different CPUs.
Network: the Model Manager allow you to take advantage of a networked environment. You can share the load of the training between as many machines as you want, as long as they have reading/writing access to your home space.
Clicking on a machine name in the left panel will add it to the right panel, which is the list of machines to be used for training.
Clicking on a machine name on the right panel removes it from the list of machines to be used.

DO NOT let the computer start the training automatically or, currently, everything will be trained on the computer you are using. You should repeat the following for all the computers you selected:

ssh host1 (e.g. ptlin1)
cd your_model_directory
at now -f ptlin1_cpu1 (and the same with ptlin1_cpu2 if you have chosen 2 cpus).
exit

Finally:

Once you have set all the options as you want, just click on Proceed to start the training. You can now log out of your machine, but, obviously, not reboot it or shut it down.

Some warnings:

Multiprocessor machines and network:
If you choose to train on more than one machine, the model manager will consider that all your machines have the same number of CPU. If it is not the case, set this number to one, or your training could be extremely slow (the real load on a single CPU machine would be four times that on a dual CPU machine).
Interrupting the training:
The full training can be as long as a few days. For this reason, if you decide to cancel the training before its end, because you realised you made a mistake at some earlier stage, follow the sequence:
- From the model manager, select the correct model and press "Train" in the training page.
- Click on "Proceed", you will be issued a warning message saying that the training is in progress.
- Press on cancel

Once your model is trained Top

Each time you start it, the model manager will verify whether any training that you have started earlier is finished.

When the training of a model is finished, the model manager remains busy for a time which depends on the complexity of the model, and you are then presented the following screen:

You can save the graphs as Postscript files or GIF images.

The model manager uses gnuplot to generate the graphs, the latest version of this software do not allow the creation of GIF images, so that you may receive an error message if you try this option.

In this version of the Models Manager, you are not presented with another opportunity to save these graphs, so think twice before pressing QUIT.

Back to main

Top