# HG changeset patch # User James Bergstra # Date 1285189479 14400 # Node ID dbac4bd107d87aa0d92702a6ad5828fd889f8457 # Parent f68b857eb11bfcfd1d8b06e7a560904e90090a1b added architecture_NB diff -r f68b857eb11b -r dbac4bd107d8 doc/v2_planning/architecture_NB.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/architecture_NB.txt Wed Sep 22 17:04:39 2010 -0400 @@ -0,0 +1,142 @@ + +Here is how I think how the Pylearn library could be organized simply and +efficiently. + +We said the main goals for a library are: +1. Easily connect new learners with new datasets +2. Easily build new formula-based learners +3. Have "hyper" learning facilities such as hyper optimization, model selection, +experiments design, etc. + +We should focus on those features. They are 80% of our use cases and the other +20% will always comprise new developments which should not be predictable. +Focusing on the 80% is relatively simple and implementation could be done in a +matter of weeks. + +Let's say we have a DBN learner and we want to plan ahead for possible +modifications and decompose it in small "usable" chunks. When a new student +wants to modify the learning procedure, we envisioned either: + +1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to +his need + +2. A hooks or messages system that allows custom actions at various set points +in the file (pre-defined but can also be "easily" added) + +However, consider that it is CODE that he wants to modify. Intricate details of +new learning algorithms possibly include modifying ANY parts of the code, adding +loops, changing algorithms, etc. There are two well time-tested methods for +dealing with this: + +1. Change the code. Add a new parameter that optionnally does the job. OR, if +changes are substantial: + +2. Copy the DBN code, modify and save your forked version of it. Each learner +or significantly new experiment should have its own file. We should not try to +generalize what is not generalizable. In other words, small loops and +mini-algorithms inside learners may not be worthy of being encapsulated. + +Based on the above three main goals, two objects need well-defined +encapsulation: datasets and learners. +(Visualization should be included in the learners. The hard part is not the +print or pylab.plot statements, it's the statistics gathering.) +Here is the basic interface we talked about, and how we would work out some +special cases. + +Datasets: fetch mini-batches as numpy arrays in the usual format. +Learners: "standalone" interface: a train function that includes optional +visualization, "advanced" interface for more control: adapt and predict +functions. + +- K-fold cross-validation? Write a generic "hyper"-learner that does this for + arbitrary learners via their "advanced" interface. ... and if multiple + similar datasets can be learned more efficiently for a particular learner? + Include an option inside the learner to cross-validate. +- Optimizers? Have a generic "Theano formula"-based learner for each optimizer + you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar + optimizers with compatible parameters. A set of helper functions should also + be provided for building the actual Theano formula. +- Early stopping? This has to be included inside the train function for each + learner where applicable (probably only the formula-based generic ones anyway) +- Generic hyper parameters optimizer? Write a generic hyper-learner that does + this. And a simple "grid" one. Require supported learners to provide the + list/distribution of their applicable hyper-parameters which will be supplied + to their constructor at the hyper-learner discretion. +- Visualization? Each learner defines what can be visualized and how. +- Early stopping curves? The early stopping learner optionally shows this. +- Complex hyper-parameters 2D-subsets curves? Add this as an option in the + hyper-parameter optimizer. +- Want a dataset that sits in RAM? Write a custom class that still outputs numpy + arrays in usual format. +- Want an infinite auto-generated dataset? Write a custom class that generates + and outputs numpy arrays on the fly. +- Dealing with time series with multi-dimensional input? This requires + cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write + dataset that outputs these and learner that understands it. OR write dataset + that converts to one-dimensional input and use any learner. +- Sophisticated performance evaluation function? This evaluation function should + be suppliable to every learner. +- Have a multi-steps complex learning procedure using gradient-based learning in + some steps? Write a "hyper"-learner that successively calls formula-based + learners and directly accesses the weights member variables for + initializations of subsequent learners. +- Want to combine early stopping curves for many hyper-parameter values? Modify + the optimization-based learners to save the early stopping curve as a member + variable and use this in the hyper-parameter learner visualization routine. +- Curriculum learning? This requires cooperation between learner and dataset. + Require supported datasets to understand a function call "set_experience" or + anything you decide. +- Filters visualization on selected best hyper-parameters set? Include code in + the formula-based learners to look for the weights applied on input and + activate visualization in hyper-learner only for the chosen hyper-parameters. + + +>> to demonstrate architecture designs on kfold dbn training - how would you +>> propose that the library help to do that? + +By providing a K-fold cross-validation generic "hyper"-learner that controls an +arbitrary learner via their advanced interface (train, adapt) and their exposed +hyper-parameters which would be fixed on the behalf of the user. + +JB asks: + What interface should the learner expose in order for the hyper-parameter to + be generic (work for many/most/all learners) + +This K-fold learner, since it is generic, would work by launching multiple +experiments and would support doing so in parallel inside of a job (python MPI +?) or by launching on the cluster multiple owned scripts that write results on +disk in the way specified by the K-fold learner. + +JB asks: + This is not technically possible if the worker nodes and the master node do + not all share a filesystem. There is a soft requirement that the library + support this so that we can do job control from DIRO without messing around + with colosse, mammouth, condor, angel, etc. all separately. + +JB asks: + The format used to communicate results from 'learner' jobs with the kfold loop + and with the stats collectors, and the experiment visualization code is not + obvious - any ideas how to handle this? + +The library would also have a DBN learner with flexible hyper-parameters that +control its detailed architecture. + +JB asks: + What kind of building blocks should make this possible - how much flexibility + and what kinds are permitted? + +The interface of the provided dataset would have to conform to possible inputs +that the DBN module understands, i.e. by +default 2D numpy arrays. If more complex dataset needs arise, either subclass a +converter for the known format or add this functionality to the DBN learner +directly. Details of the DBN learner core would resemble the tutorials, would +typically be included in one straigthforward code file and could potentially use +"Theano-formula"-based learners as intermediate steps. + +JB asks: + + One of the troubles with straightforward code is that it is neither easy to + stop and start (as in long-running jobs) nor control via a hyper-parameter + optimizer. So I don't think code in the style of the curren tutorials is very + useful in the library. +