pylearn: doc/v2_planning/architecture

comparison doc/v2_planning/architecture_NB.txt @ 1225:dbac4bd107d8

added architecture_NB

author	James Bergstra <bergstrj@iro.umontreal.ca>
date	Wed, 22 Sep 2010 17:04:39 -0400
parents
children	d9f93923765f

comparison

equal deleted inserted replaced

-:f68b857eb11b
+:dbac4bd107d8
+Here is how I think how the Pylearn library could be organized simply and
+efficiently.
+We said the main goals for a library are:
+1. Easily connect new learners with new datasets
+2. Easily build new formula-based learners
+3. Have "hyper" learning facilities such as hyper optimization, model selection,
+experiments design, etc.
+We should focus on those features. They are 80% of our use cases and the other
+20% will always comprise new developments which should not be predictable.
+Focusing on the 80% is relatively simple and implementation could be done in a
+matter of weeks.
+Let's say we have a DBN learner and we want to plan ahead for possible
+modifications and decompose it in small "usable" chunks. When a new student
+wants to modify the learning procedure, we envisioned either:
+1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to
+his need
+2. A hooks or messages system that allows custom actions at various set points
+in the file (pre-defined but can also be "easily" added)
+However, consider that it is CODE that he wants to modify. Intricate details of
+new learning algorithms possibly include modifying ANY parts of the code, adding
+loops, changing algorithms, etc. There are two well time-tested methods for
+dealing with this:
+1. Change the code. Add a new parameter that optionnally does the job. OR, if
+changes are substantial:
+2. Copy the DBN code, modify and save your forked version of it.  Each learner
+or significantly new experiment should have its own file. We should not try to
+generalize what is not generalizable.  In other words, small loops and
+mini-algorithms inside learners may not be worthy of being encapsulated.
+Based on the above three main goals, two objects need well-defined
+encapsulation: datasets and learners.
+(Visualization should be included in the learners. The hard part is not the
+print or pylab.plot statements, it's the statistics gathering.)
+Here is the basic interface we talked about, and how we would work out some
+special cases.
+Datasets: fetch mini-batches as numpy arrays in the usual format.
+Learners: "standalone" interface: a train function that includes optional
+visualization, "advanced" interface for more control: adapt and predict
+functions.
+- K-fold cross-validation? Write a generic "hyper"-learner that does this for
+arbitrary learners via their "advanced" interface.  ... and if multiple
+similar datasets can be learned more efficiently for a particular learner?
+Include an option inside the learner to cross-validate.
+- Optimizers? Have a generic "Theano formula"-based learner for each optimizer
+you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar
+optimizers with compatible parameters. A set of helper functions should also
+be provided for building the actual Theano formula.
+- Early stopping? This has to be included inside the train function for each
+learner where applicable (probably only the formula-based generic ones anyway)
+- Generic hyper parameters optimizer? Write a generic hyper-learner that does
+this. And a simple "grid" one. Require supported learners to provide the
+list/distribution of their applicable hyper-parameters which will be supplied
+to their constructor at the hyper-learner discretion.
+- Visualization? Each learner defines what can be visualized and how.
+- Early stopping curves? The early stopping learner optionally shows this.
+- Complex hyper-parameters 2D-subsets curves? Add this as an option in the
+hyper-parameter optimizer.
+- Want a dataset that sits in RAM? Write a custom class that still outputs numpy
+arrays in usual format.
+- Want an infinite auto-generated dataset? Write a custom class that generates
+and outputs numpy arrays on the fly.
+- Dealing with time series with multi-dimensional input? This requires
+cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write
+dataset that outputs these and learner that understands it. OR write dataset
+that converts to one-dimensional input and use any learner.
+- Sophisticated performance evaluation function? This evaluation function should
+be suppliable to every learner.
+- Have a multi-steps complex learning procedure using gradient-based learning in
+some steps? Write a "hyper"-learner that successively calls formula-based
+learners and directly accesses the weights member variables for
+initializations of subsequent learners.
+- Want to combine early stopping curves for many hyper-parameter values? Modify
+the optimization-based learners to save the early stopping curve as a member
+variable and use this in the hyper-parameter learner visualization routine.
+- Curriculum learning? This requires cooperation between learner and dataset.
+Require supported datasets to understand a function call "set_experience" or
+anything you decide.
+- Filters visualization on selected best hyper-parameters set? Include code in
+the formula-based learners to look for the weights applied on input and
+activate visualization in hyper-learner only for the chosen hyper-parameters.
+>> to demonstrate architecture designs on kfold dbn training - how would you
+>> propose that the library help to do that?
+By providing a K-fold cross-validation generic "hyper"-learner that controls an
+arbitrary learner via their advanced interface (train, adapt) and their exposed
+hyper-parameters which would be fixed on the behalf of the user.
+JB asks:
+What interface should the learner expose in order for the hyper-parameter to
+be generic (work for many/most/all learners)
+This K-fold learner, since it is generic, would work by launching multiple
+experiments and would support doing so in parallel inside of a job (python MPI
+?) or by launching on the cluster multiple owned scripts that write results on
+disk in the way specified by the K-fold learner.
+JB asks:
+This is not technically possible if the worker nodes and the master node do
+not all share a filesystem.  There is a soft requirement that the library
+support this so that we can do job control from DIRO without messing around
+with colosse, mammouth, condor, angel, etc. all separately.
+JB asks:
+The format used to communicate results from 'learner' jobs with the kfold loop
+and with the stats collectors, and the experiment visualization code is not
+obvious - any ideas how to handle this?
+The library would also have a DBN learner with flexible hyper-parameters that
+control its detailed architecture.
+JB asks:
+What kind of building blocks should make this possible - how much flexibility
+and what kinds are permitted?
+The interface of the provided dataset would have to conform to possible inputs
+that the DBN module understands, i.e. by
+default 2D numpy arrays. If more complex dataset needs arise, either subclass a
+converter for the known format or add this functionality to the DBN learner
+directly. Details of the DBN learner core would resemble the tutorials, would
+typically be included in one straigthforward code file and could potentially use
+"Theano-formula"-based learners as intermediate steps.
+JB asks:
+One of the troubles with straightforward code is that it is neither easy to
+stop and start (as in long-running jobs) nor control via a hyper-parameter
+optimizer.  So I don't think code in the style of the curren tutorials is very
+useful in the library.

Mercurial > pylearn

comparison doc/v2_planning/architecture_NB.txt @ 1225:dbac4bd107d8