Mercurial > pylearn
view doc/v2_planning/architecture_NB.txt @ 1435:3dd64c115657
revised version of pkldu that is a bit more structured code wise and outputs in human readable units
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Tue, 22 Feb 2011 11:23:32 -0500 |
parents | d9f93923765f |
children |
line wrap: on
line source
Here is how I think how the Pylearn library could be organized simply and efficiently. We said the main goals for a library are: 1. Easily connect new learners with new datasets 2. Easily build new formula-based learners 3. Have "hyper" learning facilities such as hyper optimization, model selection, experiments design, etc. We should focus on those features. They are 80% of our use cases and the other 20% will always comprise new developments which should not be predictable. Focusing on the 80% is relatively simple and implementation could be done in a matter of weeks. Let's say we have a DBN learner and we want to plan ahead for possible modifications and decompose it in small "usable" chunks. When a new student wants to modify the learning procedure, we envisioned either: 1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to his need 2. A hooks or messages system that allows custom actions at various set points in the file (pre-defined but can also be "easily" added) However, consider that it is CODE that he wants to modify. Intricate details of new learning algorithms possibly include modifying ANY parts of the code, adding loops, changing algorithms, etc. There are two well time-tested methods for dealing with this: 1. Change the code. Add a new parameter that optionnally does the job. OR, if changes are substantial: 2. Copy the DBN code, modify and save your forked version of it. Each learner or significantly new experiment should have its own file. We should not try to generalize what is not generalizable. In other words, small loops and mini-algorithms inside learners may not be worthy of being encapsulated. Based on the above three main goals, two objects need well-defined encapsulation: datasets and learners. (Visualization should be included in the learners. The hard part is not the print or pylab.plot statements, it's the statistics gathering.) Here is the basic interface we talked about, and how we would work out some special cases. Datasets: fetch mini-batches as numpy arrays in the usual format. Learners: "standalone" interface: a train function that includes optional visualization, "advanced" interface for more control: adapt and predict functions. - K-fold cross-validation? Write a generic "hyper"-learner that does this for arbitrary learners via their "advanced" interface. ... and if multiple similar datasets can be learned more efficiently for a particular learner? Include an option inside the learner to cross-validate. - Optimizers? Have a generic "Theano formula"-based learner for each optimizer you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar optimizers with compatible parameters. A set of helper functions should also be provided for building the actual Theano formula. - Early stopping? This has to be included inside the train function for each learner where applicable (probably only the formula-based generic ones anyway) - Generic hyper parameters optimizer? Write a generic hyper-learner that does this. And a simple "grid" one. Require supported learners to provide the list/distribution of their applicable hyper-parameters which will be supplied to their constructor at the hyper-learner discretion. - Visualization? Each learner defines what can be visualized and how. - Early stopping curves? The early stopping learner optionally shows this. - Complex hyper-parameters 2D-subsets curves? Add this as an option in the hyper-parameter optimizer. - Want a dataset that sits in RAM? Write a custom class that still outputs numpy arrays in usual format. - Want an infinite auto-generated dataset? Write a custom class that generates and outputs numpy arrays on the fly. - Dealing with time series with multi-dimensional input? This requires cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write dataset that outputs these and learner that understands it. OR write dataset that converts to one-dimensional input and use any learner. - Sophisticated performance evaluation function? This evaluation function should be suppliable to every learner. - Have a multi-steps complex learning procedure using gradient-based learning in some steps? Write a "hyper"-learner that successively calls formula-based learners and directly accesses the weights member variables for initializations of subsequent learners. - Want to combine early stopping curves for many hyper-parameter values? Modify the optimization-based learners to save the early stopping curve as a member variable and use this in the hyper-parameter learner visualization routine. - Curriculum learning? This requires cooperation between learner and dataset. Require supported datasets to understand a function call "set_experience" or anything you decide. - Filters visualization on selected best hyper-parameters set? Include code in the formula-based learners to look for the weights applied on input and activate visualization in hyper-learner only for the chosen hyper-parameters. >> to demonstrate architecture designs on kfold dbn training - how would you >> propose that the library help to do that? By providing a K-fold cross-validation generic "hyper"-learner that controls an arbitrary learner via their advanced interface (train, adapt) and their exposed hyper-parameters which would be fixed on the behalf of the user. JB asks: What interface should the learner expose in order for the hyper-parameter to be generic (work for many/most/all learners) NB: In the case of a K-fold hyper-learner, I would expect the user to completely specify the hyper-parameters and the hyper-learner could just blindly pass them along to the sub-learner. For more complex hyper-learners like hyper-optimizer or hyper-grid we would require supported sub-learners to define a function "get_hyperparam" that returns a dict(name1: [default, range], name2: ...). These hyper-parameters are supplied to the learner constructor. This K-fold learner, since it is generic, would work by launching multiple experiments and would support doing so in parallel inside of a job (python MPI ?) or by launching on the cluster multiple owned scripts that write results on disk in the way specified by the K-fold learner. JB asks: This is not technically possible if the worker nodes and the master node do not all share a filesystem. There is a soft requirement that the library support this so that we can do job control from DIRO without messing around with colosse, mammouth, condor, angel, etc. all separately. NB: The hyper-learner would have to support launching jobs on remote servers via ssh. Common functionality for this could of course be reused between different hyper-learners. JB asks: The format used to communicate results from 'learner' jobs with the kfold loop and with the stats collectors, and the experiment visualization code is not obvious - any ideas how to handle this? NB: The DBN is responsible for saving/viewing results inside a DBN experiment. The hyper-learner controls DBN execution (even in a script on a remote machine) and collects evaluation measurements after its dbn.predict call. For K-fold it would typically just save the evaluation distribution and average in whatever way (internal convention) that can be transfered over ssh. The K-fold hyper-learner would only expose its train interface (no adapt, predict) since it cannot always be decomposed in many steps depending on the sublearner. The library would also have a DBN learner with flexible hyper-parameters that control its detailed architecture. JB asks: What kind of building blocks should make this possible - how much flexibility and what kinds are permitted? NB: Things like number of layers, hidden units and any optional parameters that affect initialization or training (i.e. AE or RBM variant) that the DBN developer can think of. The final user would have to specify those hyper-parameters to the K-fold learner anyway. The interface of the provided dataset would have to conform to possible inputs that the DBN module understands, i.e. by default 2D numpy arrays. If more complex dataset needs arise, either subclass a converter for the known format or add this functionality to the DBN learner directly. Details of the DBN learner core would resemble the tutorials, would typically be included in one straigthforward code file and could potentially use "Theano-formula"-based learners as intermediate steps. JB asks: One of the troubles with straightforward code is that it is neither easy to stop and start (as in long-running jobs) nor control via a hyper-parameter optimizer. So I don't think code in the style of the curren tutorials is very useful in the library. NB: I could see how we could require all learners to define stop and restart methods so they would be responsible to save and restore themselves. A hyper-learner's stop and restart method would in addition call recursively its subleaners' stop and restart methods.