comparison doc/v2_planning/architecture_NB.txt @ 1225:dbac4bd107d8

added architecture_NB
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 22 Sep 2010 17:04:39 -0400
parents
children d9f93923765f
comparison
equal deleted inserted replaced
1224:f68b857eb11b 1225:dbac4bd107d8
1
2 Here is how I think how the Pylearn library could be organized simply and
3 efficiently.
4
5 We said the main goals for a library are:
6 1. Easily connect new learners with new datasets
7 2. Easily build new formula-based learners
8 3. Have "hyper" learning facilities such as hyper optimization, model selection,
9 experiments design, etc.
10
11 We should focus on those features. They are 80% of our use cases and the other
12 20% will always comprise new developments which should not be predictable.
13 Focusing on the 80% is relatively simple and implementation could be done in a
14 matter of weeks.
15
16 Let's say we have a DBN learner and we want to plan ahead for possible
17 modifications and decompose it in small "usable" chunks. When a new student
18 wants to modify the learning procedure, we envisioned either:
19
20 1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to
21 his need
22
23 2. A hooks or messages system that allows custom actions at various set points
24 in the file (pre-defined but can also be "easily" added)
25
26 However, consider that it is CODE that he wants to modify. Intricate details of
27 new learning algorithms possibly include modifying ANY parts of the code, adding
28 loops, changing algorithms, etc. There are two well time-tested methods for
29 dealing with this:
30
31 1. Change the code. Add a new parameter that optionnally does the job. OR, if
32 changes are substantial:
33
34 2. Copy the DBN code, modify and save your forked version of it. Each learner
35 or significantly new experiment should have its own file. We should not try to
36 generalize what is not generalizable. In other words, small loops and
37 mini-algorithms inside learners may not be worthy of being encapsulated.
38
39 Based on the above three main goals, two objects need well-defined
40 encapsulation: datasets and learners.
41 (Visualization should be included in the learners. The hard part is not the
42 print or pylab.plot statements, it's the statistics gathering.)
43 Here is the basic interface we talked about, and how we would work out some
44 special cases.
45
46 Datasets: fetch mini-batches as numpy arrays in the usual format.
47 Learners: "standalone" interface: a train function that includes optional
48 visualization, "advanced" interface for more control: adapt and predict
49 functions.
50
51 - K-fold cross-validation? Write a generic "hyper"-learner that does this for
52 arbitrary learners via their "advanced" interface. ... and if multiple
53 similar datasets can be learned more efficiently for a particular learner?
54 Include an option inside the learner to cross-validate.
55 - Optimizers? Have a generic "Theano formula"-based learner for each optimizer
56 you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar
57 optimizers with compatible parameters. A set of helper functions should also
58 be provided for building the actual Theano formula.
59 - Early stopping? This has to be included inside the train function for each
60 learner where applicable (probably only the formula-based generic ones anyway)
61 - Generic hyper parameters optimizer? Write a generic hyper-learner that does
62 this. And a simple "grid" one. Require supported learners to provide the
63 list/distribution of their applicable hyper-parameters which will be supplied
64 to their constructor at the hyper-learner discretion.
65 - Visualization? Each learner defines what can be visualized and how.
66 - Early stopping curves? The early stopping learner optionally shows this.
67 - Complex hyper-parameters 2D-subsets curves? Add this as an option in the
68 hyper-parameter optimizer.
69 - Want a dataset that sits in RAM? Write a custom class that still outputs numpy
70 arrays in usual format.
71 - Want an infinite auto-generated dataset? Write a custom class that generates
72 and outputs numpy arrays on the fly.
73 - Dealing with time series with multi-dimensional input? This requires
74 cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write
75 dataset that outputs these and learner that understands it. OR write dataset
76 that converts to one-dimensional input and use any learner.
77 - Sophisticated performance evaluation function? This evaluation function should
78 be suppliable to every learner.
79 - Have a multi-steps complex learning procedure using gradient-based learning in
80 some steps? Write a "hyper"-learner that successively calls formula-based
81 learners and directly accesses the weights member variables for
82 initializations of subsequent learners.
83 - Want to combine early stopping curves for many hyper-parameter values? Modify
84 the optimization-based learners to save the early stopping curve as a member
85 variable and use this in the hyper-parameter learner visualization routine.
86 - Curriculum learning? This requires cooperation between learner and dataset.
87 Require supported datasets to understand a function call "set_experience" or
88 anything you decide.
89 - Filters visualization on selected best hyper-parameters set? Include code in
90 the formula-based learners to look for the weights applied on input and
91 activate visualization in hyper-learner only for the chosen hyper-parameters.
92
93
94 >> to demonstrate architecture designs on kfold dbn training - how would you
95 >> propose that the library help to do that?
96
97 By providing a K-fold cross-validation generic "hyper"-learner that controls an
98 arbitrary learner via their advanced interface (train, adapt) and their exposed
99 hyper-parameters which would be fixed on the behalf of the user.
100
101 JB asks:
102 What interface should the learner expose in order for the hyper-parameter to
103 be generic (work for many/most/all learners)
104
105 This K-fold learner, since it is generic, would work by launching multiple
106 experiments and would support doing so in parallel inside of a job (python MPI
107 ?) or by launching on the cluster multiple owned scripts that write results on
108 disk in the way specified by the K-fold learner.
109
110 JB asks:
111 This is not technically possible if the worker nodes and the master node do
112 not all share a filesystem. There is a soft requirement that the library
113 support this so that we can do job control from DIRO without messing around
114 with colosse, mammouth, condor, angel, etc. all separately.
115
116 JB asks:
117 The format used to communicate results from 'learner' jobs with the kfold loop
118 and with the stats collectors, and the experiment visualization code is not
119 obvious - any ideas how to handle this?
120
121 The library would also have a DBN learner with flexible hyper-parameters that
122 control its detailed architecture.
123
124 JB asks:
125 What kind of building blocks should make this possible - how much flexibility
126 and what kinds are permitted?
127
128 The interface of the provided dataset would have to conform to possible inputs
129 that the DBN module understands, i.e. by
130 default 2D numpy arrays. If more complex dataset needs arise, either subclass a
131 converter for the known format or add this functionality to the DBN learner
132 directly. Details of the DBN learner core would resemble the tutorials, would
133 typically be included in one straigthforward code file and could potentially use
134 "Theano-formula"-based learners as intermediate steps.
135
136 JB asks:
137
138 One of the troubles with straightforward code is that it is neither easy to
139 stop and start (as in long-running jobs) nor control via a hyper-parameter
140 optimizer. So I don't think code in the style of the curren tutorials is very
141 useful in the library.
142