diff doc/v2_planning/architecture_NB.txt @ 1225:dbac4bd107d8

added architecture_NB
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 22 Sep 2010 17:04:39 -0400
parents
children d9f93923765f
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/architecture_NB.txt	Wed Sep 22 17:04:39 2010 -0400
@@ -0,0 +1,142 @@
+
+Here is how I think how the Pylearn library could be organized simply and
+efficiently.
+
+We said the main goals for a library are:
+1. Easily connect new learners with new datasets
+2. Easily build new formula-based learners
+3. Have "hyper" learning facilities such as hyper optimization, model selection,
+experiments design, etc.
+
+We should focus on those features. They are 80% of our use cases and the other
+20% will always comprise new developments which should not be predictable.
+Focusing on the 80% is relatively simple and implementation could be done in a
+matter of weeks.
+
+Let's say we have a DBN learner and we want to plan ahead for possible
+modifications and decompose it in small "usable" chunks. When a new student
+wants to modify the learning procedure, we envisioned either:
+
+1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to
+his need
+
+2. A hooks or messages system that allows custom actions at various set points
+in the file (pre-defined but can also be "easily" added)
+
+However, consider that it is CODE that he wants to modify. Intricate details of
+new learning algorithms possibly include modifying ANY parts of the code, adding
+loops, changing algorithms, etc. There are two well time-tested methods for
+dealing with this:
+
+1. Change the code. Add a new parameter that optionnally does the job. OR, if
+changes are substantial:
+
+2. Copy the DBN code, modify and save your forked version of it.  Each learner
+or significantly new experiment should have its own file. We should not try to
+generalize what is not generalizable.  In other words, small loops and
+mini-algorithms inside learners may not be worthy of being encapsulated.
+
+Based on the above three main goals, two objects need well-defined
+encapsulation: datasets and learners.
+(Visualization should be included in the learners. The hard part is not the
+print or pylab.plot statements, it's the statistics gathering.)
+Here is the basic interface we talked about, and how we would work out some
+special cases.
+
+Datasets: fetch mini-batches as numpy arrays in the usual format.
+Learners: "standalone" interface: a train function that includes optional
+visualization, "advanced" interface for more control: adapt and predict
+functions.
+
+- K-fold cross-validation? Write a generic "hyper"-learner that does this for
+  arbitrary learners via their "advanced" interface.  ... and if multiple
+  similar datasets can be learned more efficiently for a particular learner?
+  Include an option inside the learner to cross-validate.
+- Optimizers? Have a generic "Theano formula"-based learner for each optimizer
+  you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar
+  optimizers with compatible parameters. A set of helper functions should also
+  be provided for building the actual Theano formula.
+- Early stopping? This has to be included inside the train function for each
+  learner where applicable (probably only the formula-based generic ones anyway)
+- Generic hyper parameters optimizer? Write a generic hyper-learner that does
+  this. And a simple "grid" one. Require supported learners to provide the
+  list/distribution of their applicable hyper-parameters which will be supplied
+  to their constructor at the hyper-learner discretion.
+- Visualization? Each learner defines what can be visualized and how.
+- Early stopping curves? The early stopping learner optionally shows this.
+- Complex hyper-parameters 2D-subsets curves? Add this as an option in the
+  hyper-parameter optimizer.
+- Want a dataset that sits in RAM? Write a custom class that still outputs numpy
+  arrays in usual format.
+- Want an infinite auto-generated dataset? Write a custom class that generates
+  and outputs numpy arrays on the fly.
+- Dealing with time series with multi-dimensional input? This requires
+  cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write
+  dataset that outputs these and learner that understands it. OR write dataset
+  that converts to one-dimensional input and use any learner.
+- Sophisticated performance evaluation function? This evaluation function should
+  be suppliable to every learner.
+- Have a multi-steps complex learning procedure using gradient-based learning in
+  some steps? Write a "hyper"-learner that successively calls formula-based
+  learners and directly accesses the weights member variables for
+  initializations of subsequent learners.
+- Want to combine early stopping curves for many hyper-parameter values? Modify
+  the optimization-based learners to save the early stopping curve as a member
+  variable and use this in the hyper-parameter learner visualization routine.
+- Curriculum learning? This requires cooperation between learner and dataset.
+  Require supported datasets to understand a function call "set_experience" or
+  anything you decide.
+- Filters visualization on selected best hyper-parameters set? Include code in
+  the formula-based learners to look for the weights applied on input and
+  activate visualization in hyper-learner only for the chosen hyper-parameters.
+
+
+>> to demonstrate architecture designs on kfold dbn training - how would you
+>> propose that the library help to do that?
+
+By providing a K-fold cross-validation generic "hyper"-learner that controls an
+arbitrary learner via their advanced interface (train, adapt) and their exposed
+hyper-parameters which would be fixed on the behalf of the user.
+
+JB asks: 
+  What interface should the learner expose in order for the hyper-parameter to
+  be generic (work for many/most/all learners)
+
+This K-fold learner, since it is generic, would work by launching multiple
+experiments and would support doing so in parallel inside of a job (python MPI
+?) or by launching on the cluster multiple owned scripts that write results on
+disk in the way specified by the K-fold learner.
+
+JB asks:
+  This is not technically possible if the worker nodes and the master node do
+  not all share a filesystem.  There is a soft requirement that the library
+  support this so that we can do job control from DIRO without messing around
+  with colosse, mammouth, condor, angel, etc. all separately.
+
+JB asks:
+  The format used to communicate results from 'learner' jobs with the kfold loop
+  and with the stats collectors, and the experiment visualization code is not
+  obvious - any ideas how to handle this?
+
+The library would also have a DBN learner with flexible hyper-parameters that
+control its detailed architecture. 
+
+JB asks: 
+  What kind of building blocks should make this possible - how much flexibility
+  and what kinds are permitted?
+
+The interface of the provided dataset would have to conform to possible inputs
+that the DBN module understands, i.e. by
+default 2D numpy arrays. If more complex dataset needs arise, either subclass a
+converter for the known format or add this functionality to the DBN learner
+directly. Details of the DBN learner core would resemble the tutorials, would
+typically be included in one straigthforward code file and could potentially use
+"Theano-formula"-based learners as intermediate steps.
+
+JB asks:
+
+  One of the troubles with straightforward code is that it is neither easy to
+  stop and start (as in long-running jobs) nor control via a hyper-parameter
+  optimizer.  So I don't think code in the style of the curren tutorials is very
+  useful in the library.
+