Mercurial > pylearn
comparison doc/v2_planning/architecture_NB.txt @ 1225:dbac4bd107d8
added architecture_NB
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 22 Sep 2010 17:04:39 -0400 |
parents | |
children | d9f93923765f |
comparison
equal
deleted
inserted
replaced
1224:f68b857eb11b | 1225:dbac4bd107d8 |
---|---|
1 | |
2 Here is how I think how the Pylearn library could be organized simply and | |
3 efficiently. | |
4 | |
5 We said the main goals for a library are: | |
6 1. Easily connect new learners with new datasets | |
7 2. Easily build new formula-based learners | |
8 3. Have "hyper" learning facilities such as hyper optimization, model selection, | |
9 experiments design, etc. | |
10 | |
11 We should focus on those features. They are 80% of our use cases and the other | |
12 20% will always comprise new developments which should not be predictable. | |
13 Focusing on the 80% is relatively simple and implementation could be done in a | |
14 matter of weeks. | |
15 | |
16 Let's say we have a DBN learner and we want to plan ahead for possible | |
17 modifications and decompose it in small "usable" chunks. When a new student | |
18 wants to modify the learning procedure, we envisioned either: | |
19 | |
20 1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to | |
21 his need | |
22 | |
23 2. A hooks or messages system that allows custom actions at various set points | |
24 in the file (pre-defined but can also be "easily" added) | |
25 | |
26 However, consider that it is CODE that he wants to modify. Intricate details of | |
27 new learning algorithms possibly include modifying ANY parts of the code, adding | |
28 loops, changing algorithms, etc. There are two well time-tested methods for | |
29 dealing with this: | |
30 | |
31 1. Change the code. Add a new parameter that optionnally does the job. OR, if | |
32 changes are substantial: | |
33 | |
34 2. Copy the DBN code, modify and save your forked version of it. Each learner | |
35 or significantly new experiment should have its own file. We should not try to | |
36 generalize what is not generalizable. In other words, small loops and | |
37 mini-algorithms inside learners may not be worthy of being encapsulated. | |
38 | |
39 Based on the above three main goals, two objects need well-defined | |
40 encapsulation: datasets and learners. | |
41 (Visualization should be included in the learners. The hard part is not the | |
42 print or pylab.plot statements, it's the statistics gathering.) | |
43 Here is the basic interface we talked about, and how we would work out some | |
44 special cases. | |
45 | |
46 Datasets: fetch mini-batches as numpy arrays in the usual format. | |
47 Learners: "standalone" interface: a train function that includes optional | |
48 visualization, "advanced" interface for more control: adapt and predict | |
49 functions. | |
50 | |
51 - K-fold cross-validation? Write a generic "hyper"-learner that does this for | |
52 arbitrary learners via their "advanced" interface. ... and if multiple | |
53 similar datasets can be learned more efficiently for a particular learner? | |
54 Include an option inside the learner to cross-validate. | |
55 - Optimizers? Have a generic "Theano formula"-based learner for each optimizer | |
56 you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar | |
57 optimizers with compatible parameters. A set of helper functions should also | |
58 be provided for building the actual Theano formula. | |
59 - Early stopping? This has to be included inside the train function for each | |
60 learner where applicable (probably only the formula-based generic ones anyway) | |
61 - Generic hyper parameters optimizer? Write a generic hyper-learner that does | |
62 this. And a simple "grid" one. Require supported learners to provide the | |
63 list/distribution of their applicable hyper-parameters which will be supplied | |
64 to their constructor at the hyper-learner discretion. | |
65 - Visualization? Each learner defines what can be visualized and how. | |
66 - Early stopping curves? The early stopping learner optionally shows this. | |
67 - Complex hyper-parameters 2D-subsets curves? Add this as an option in the | |
68 hyper-parameter optimizer. | |
69 - Want a dataset that sits in RAM? Write a custom class that still outputs numpy | |
70 arrays in usual format. | |
71 - Want an infinite auto-generated dataset? Write a custom class that generates | |
72 and outputs numpy arrays on the fly. | |
73 - Dealing with time series with multi-dimensional input? This requires | |
74 cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write | |
75 dataset that outputs these and learner that understands it. OR write dataset | |
76 that converts to one-dimensional input and use any learner. | |
77 - Sophisticated performance evaluation function? This evaluation function should | |
78 be suppliable to every learner. | |
79 - Have a multi-steps complex learning procedure using gradient-based learning in | |
80 some steps? Write a "hyper"-learner that successively calls formula-based | |
81 learners and directly accesses the weights member variables for | |
82 initializations of subsequent learners. | |
83 - Want to combine early stopping curves for many hyper-parameter values? Modify | |
84 the optimization-based learners to save the early stopping curve as a member | |
85 variable and use this in the hyper-parameter learner visualization routine. | |
86 - Curriculum learning? This requires cooperation between learner and dataset. | |
87 Require supported datasets to understand a function call "set_experience" or | |
88 anything you decide. | |
89 - Filters visualization on selected best hyper-parameters set? Include code in | |
90 the formula-based learners to look for the weights applied on input and | |
91 activate visualization in hyper-learner only for the chosen hyper-parameters. | |
92 | |
93 | |
94 >> to demonstrate architecture designs on kfold dbn training - how would you | |
95 >> propose that the library help to do that? | |
96 | |
97 By providing a K-fold cross-validation generic "hyper"-learner that controls an | |
98 arbitrary learner via their advanced interface (train, adapt) and their exposed | |
99 hyper-parameters which would be fixed on the behalf of the user. | |
100 | |
101 JB asks: | |
102 What interface should the learner expose in order for the hyper-parameter to | |
103 be generic (work for many/most/all learners) | |
104 | |
105 This K-fold learner, since it is generic, would work by launching multiple | |
106 experiments and would support doing so in parallel inside of a job (python MPI | |
107 ?) or by launching on the cluster multiple owned scripts that write results on | |
108 disk in the way specified by the K-fold learner. | |
109 | |
110 JB asks: | |
111 This is not technically possible if the worker nodes and the master node do | |
112 not all share a filesystem. There is a soft requirement that the library | |
113 support this so that we can do job control from DIRO without messing around | |
114 with colosse, mammouth, condor, angel, etc. all separately. | |
115 | |
116 JB asks: | |
117 The format used to communicate results from 'learner' jobs with the kfold loop | |
118 and with the stats collectors, and the experiment visualization code is not | |
119 obvious - any ideas how to handle this? | |
120 | |
121 The library would also have a DBN learner with flexible hyper-parameters that | |
122 control its detailed architecture. | |
123 | |
124 JB asks: | |
125 What kind of building blocks should make this possible - how much flexibility | |
126 and what kinds are permitted? | |
127 | |
128 The interface of the provided dataset would have to conform to possible inputs | |
129 that the DBN module understands, i.e. by | |
130 default 2D numpy arrays. If more complex dataset needs arise, either subclass a | |
131 converter for the known format or add this functionality to the DBN learner | |
132 directly. Details of the DBN learner core would resemble the tutorials, would | |
133 typically be included in one straigthforward code file and could potentially use | |
134 "Theano-formula"-based learners as intermediate steps. | |
135 | |
136 JB asks: | |
137 | |
138 One of the troubles with straightforward code is that it is neither easy to | |
139 stop and start (as in long-running jobs) nor control via a hyper-parameter | |
140 optimizer. So I don't think code in the style of the curren tutorials is very | |
141 useful in the library. | |
142 |