view doc/v2_planning/architecture_NB.txt @ 1239:470beb000694

merge
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Thu, 23 Sep 2010 11:49:42 -0400
parents d9f93923765f
children
line wrap: on
line source


Here is how I think how the Pylearn library could be organized simply and
efficiently.

We said the main goals for a library are:
1. Easily connect new learners with new datasets
2. Easily build new formula-based learners
3. Have "hyper" learning facilities such as hyper optimization, model selection,
experiments design, etc.

We should focus on those features. They are 80% of our use cases and the other
20% will always comprise new developments which should not be predictable.
Focusing on the 80% is relatively simple and implementation could be done in a
matter of weeks.

Let's say we have a DBN learner and we want to plan ahead for possible
modifications and decompose it in small "usable" chunks. When a new student
wants to modify the learning procedure, we envisioned either:

1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to
his need

2. A hooks or messages system that allows custom actions at various set points
in the file (pre-defined but can also be "easily" added)

However, consider that it is CODE that he wants to modify. Intricate details of
new learning algorithms possibly include modifying ANY parts of the code, adding
loops, changing algorithms, etc. There are two well time-tested methods for
dealing with this:

1. Change the code. Add a new parameter that optionnally does the job. OR, if
changes are substantial:

2. Copy the DBN code, modify and save your forked version of it.  Each learner
or significantly new experiment should have its own file. We should not try to
generalize what is not generalizable.  In other words, small loops and
mini-algorithms inside learners may not be worthy of being encapsulated.

Based on the above three main goals, two objects need well-defined
encapsulation: datasets and learners.
(Visualization should be included in the learners. The hard part is not the
print or pylab.plot statements, it's the statistics gathering.)
Here is the basic interface we talked about, and how we would work out some
special cases.

Datasets: fetch mini-batches as numpy arrays in the usual format.
Learners: "standalone" interface: a train function that includes optional
visualization, "advanced" interface for more control: adapt and predict
functions.

- K-fold cross-validation? Write a generic "hyper"-learner that does this for
  arbitrary learners via their "advanced" interface.  ... and if multiple
  similar datasets can be learned more efficiently for a particular learner?
  Include an option inside the learner to cross-validate.
- Optimizers? Have a generic "Theano formula"-based learner for each optimizer
  you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar
  optimizers with compatible parameters. A set of helper functions should also
  be provided for building the actual Theano formula.
- Early stopping? This has to be included inside the train function for each
  learner where applicable (probably only the formula-based generic ones anyway)
- Generic hyper parameters optimizer? Write a generic hyper-learner that does
  this. And a simple "grid" one. Require supported learners to provide the
  list/distribution of their applicable hyper-parameters which will be supplied
  to their constructor at the hyper-learner discretion.
- Visualization? Each learner defines what can be visualized and how.
- Early stopping curves? The early stopping learner optionally shows this.
- Complex hyper-parameters 2D-subsets curves? Add this as an option in the
  hyper-parameter optimizer.
- Want a dataset that sits in RAM? Write a custom class that still outputs numpy
  arrays in usual format.
- Want an infinite auto-generated dataset? Write a custom class that generates
  and outputs numpy arrays on the fly.
- Dealing with time series with multi-dimensional input? This requires
  cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write
  dataset that outputs these and learner that understands it. OR write dataset
  that converts to one-dimensional input and use any learner.
- Sophisticated performance evaluation function? This evaluation function should
  be suppliable to every learner.
- Have a multi-steps complex learning procedure using gradient-based learning in
  some steps? Write a "hyper"-learner that successively calls formula-based
  learners and directly accesses the weights member variables for
  initializations of subsequent learners.
- Want to combine early stopping curves for many hyper-parameter values? Modify
  the optimization-based learners to save the early stopping curve as a member
  variable and use this in the hyper-parameter learner visualization routine.
- Curriculum learning? This requires cooperation between learner and dataset.
  Require supported datasets to understand a function call "set_experience" or
  anything you decide.
- Filters visualization on selected best hyper-parameters set? Include code in
  the formula-based learners to look for the weights applied on input and
  activate visualization in hyper-learner only for the chosen hyper-parameters.


>> to demonstrate architecture designs on kfold dbn training - how would you
>> propose that the library help to do that?

By providing a K-fold cross-validation generic "hyper"-learner that controls an
arbitrary learner via their advanced interface (train, adapt) and their exposed
hyper-parameters which would be fixed on the behalf of the user.

JB asks: 
  What interface should the learner expose in order for the hyper-parameter to
  be generic (work for many/most/all learners)

NB: In the case of a K-fold hyper-learner, I would expect the user to
  completely specify the hyper-parameters and the hyper-learner could just
  blindly pass them along to the sub-learner. For more complex hyper-learners
  like hyper-optimizer or hyper-grid we would require supported sub-learners
  to define a function "get_hyperparam" that returns a
  dict(name1: [default, range], name2: ...). These hyper-parameters are
  supplied to the learner constructor.

This K-fold learner, since it is generic, would work by launching multiple
experiments and would support doing so in parallel inside of a job (python MPI
?) or by launching on the cluster multiple owned scripts that write results on
disk in the way specified by the K-fold learner.

JB asks:
  This is not technically possible if the worker nodes and the master node do
  not all share a filesystem.  There is a soft requirement that the library
  support this so that we can do job control from DIRO without messing around
  with colosse, mammouth, condor, angel, etc. all separately.

NB: The hyper-learner would have to support launching jobs on remote servers
  via ssh. Common functionality for this could of course be reused between
  different hyper-learners.

JB asks:
  The format used to communicate results from 'learner' jobs with the kfold loop
  and with the stats collectors, and the experiment visualization code is not
  obvious - any ideas how to handle this?

NB: The DBN is responsible for saving/viewing results inside a DBN experiment.
  The hyper-learner controls DBN execution (even in a script on a remote
  machine) and collects evaluation measurements after its dbn.predict call.
  For K-fold it would typically just save the evaluation distribution and
  average in whatever way (internal convention) that can be transfered over ssh.
  The K-fold hyper-learner would only expose its train interface (no adapt,
  predict) since it cannot always be decomposed in many steps depending on the
  sublearner.

The library would also have a DBN learner with flexible hyper-parameters that
control its detailed architecture. 

JB asks: 
  What kind of building blocks should make this possible - how much flexibility
  and what kinds are permitted?

NB: Things like number of layers, hidden units and any optional parameters
  that affect initialization or training (i.e. AE or RBM variant) that the DBN
  developer can think of. The final user would have to specify those
  hyper-parameters to the K-fold learner anyway.

The interface of the provided dataset would have to conform to possible inputs
that the DBN module understands, i.e. by
default 2D numpy arrays. If more complex dataset needs arise, either subclass a
converter for the known format or add this functionality to the DBN learner
directly. Details of the DBN learner core would resemble the tutorials, would
typically be included in one straigthforward code file and could potentially use
"Theano-formula"-based learners as intermediate steps.

JB asks:

  One of the troubles with straightforward code is that it is neither easy to
  stop and start (as in long-running jobs) nor control via a hyper-parameter
  optimizer.  So I don't think code in the style of the curren tutorials is very
  useful in the library.
  
NB: I could see how we could require all learners to define stop and restart
  methods so they would be responsible to save and restore themselves.
  A hyper-learner's stop and restart method would in addition call recursively
  its subleaners' stop and restart methods.