annotate doc/v2_planning/architecture_NB.txt @ 1474:a57f4839a9d8

merge
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 18 May 2011 10:52:42 -0400
parents d9f93923765f
children
rev   line source
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Here is how I think how the Pylearn library could be organized simply and
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 efficiently.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5 We said the main goals for a library are:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 1. Easily connect new learners with new datasets
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7 2. Easily build new formula-based learners
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8 3. Have "hyper" learning facilities such as hyper optimization, model selection,
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9 experiments design, etc.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11 We should focus on those features. They are 80% of our use cases and the other
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 20% will always comprise new developments which should not be predictable.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13 Focusing on the 80% is relatively simple and implementation could be done in a
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 matter of weeks.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16 Let's say we have a DBN learner and we want to plan ahead for possible
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 modifications and decompose it in small "usable" chunks. When a new student
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18 wants to modify the learning procedure, we envisioned either:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20 1. A pre-made hyper-learning graph of a DBN that he can "conveniently" adapt to
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21 his need
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23 2. A hooks or messages system that allows custom actions at various set points
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 in the file (pre-defined but can also be "easily" added)
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 However, consider that it is CODE that he wants to modify. Intricate details of
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27 new learning algorithms possibly include modifying ANY parts of the code, adding
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28 loops, changing algorithms, etc. There are two well time-tested methods for
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29 dealing with this:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31 1. Change the code. Add a new parameter that optionnally does the job. OR, if
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32 changes are substantial:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 2. Copy the DBN code, modify and save your forked version of it. Each learner
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35 or significantly new experiment should have its own file. We should not try to
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36 generalize what is not generalizable. In other words, small loops and
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37 mini-algorithms inside learners may not be worthy of being encapsulated.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 Based on the above three main goals, two objects need well-defined
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40 encapsulation: datasets and learners.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41 (Visualization should be included in the learners. The hard part is not the
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42 print or pylab.plot statements, it's the statistics gathering.)
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43 Here is the basic interface we talked about, and how we would work out some
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44 special cases.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
45
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46 Datasets: fetch mini-batches as numpy arrays in the usual format.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47 Learners: "standalone" interface: a train function that includes optional
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 visualization, "advanced" interface for more control: adapt and predict
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49 functions.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51 - K-fold cross-validation? Write a generic "hyper"-learner that does this for
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 arbitrary learners via their "advanced" interface. ... and if multiple
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53 similar datasets can be learned more efficiently for a particular learner?
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54 Include an option inside the learner to cross-validate.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 - Optimizers? Have a generic "Theano formula"-based learner for each optimizer
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56 you want (SGD, momentum, delta-bar-delta, etc.). Of course combine similar
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57 optimizers with compatible parameters. A set of helper functions should also
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58 be provided for building the actual Theano formula.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
59 - Early stopping? This has to be included inside the train function for each
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60 learner where applicable (probably only the formula-based generic ones anyway)
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
61 - Generic hyper parameters optimizer? Write a generic hyper-learner that does
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
62 this. And a simple "grid" one. Require supported learners to provide the
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
63 list/distribution of their applicable hyper-parameters which will be supplied
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64 to their constructor at the hyper-learner discretion.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65 - Visualization? Each learner defines what can be visualized and how.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66 - Early stopping curves? The early stopping learner optionally shows this.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67 - Complex hyper-parameters 2D-subsets curves? Add this as an option in the
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 hyper-parameter optimizer.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
69 - Want a dataset that sits in RAM? Write a custom class that still outputs numpy
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
70 arrays in usual format.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 - Want an infinite auto-generated dataset? Write a custom class that generates
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72 and outputs numpy arrays on the fly.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73 - Dealing with time series with multi-dimensional input? This requires
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
74 cooperation between learner and dataset. Use 3-dimensional numpy arrays. Write
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
75 dataset that outputs these and learner that understands it. OR write dataset
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
76 that converts to one-dimensional input and use any learner.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
77 - Sophisticated performance evaluation function? This evaluation function should
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
78 be suppliable to every learner.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
79 - Have a multi-steps complex learning procedure using gradient-based learning in
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
80 some steps? Write a "hyper"-learner that successively calls formula-based
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
81 learners and directly accesses the weights member variables for
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
82 initializations of subsequent learners.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
83 - Want to combine early stopping curves for many hyper-parameter values? Modify
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
84 the optimization-based learners to save the early stopping curve as a member
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
85 variable and use this in the hyper-parameter learner visualization routine.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
86 - Curriculum learning? This requires cooperation between learner and dataset.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
87 Require supported datasets to understand a function call "set_experience" or
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
88 anything you decide.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
89 - Filters visualization on selected best hyper-parameters set? Include code in
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
90 the formula-based learners to look for the weights applied on input and
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
91 activate visualization in hyper-learner only for the chosen hyper-parameters.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
92
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
93
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
94 >> to demonstrate architecture designs on kfold dbn training - how would you
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
95 >> propose that the library help to do that?
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
96
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
97 By providing a K-fold cross-validation generic "hyper"-learner that controls an
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
98 arbitrary learner via their advanced interface (train, adapt) and their exposed
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
99 hyper-parameters which would be fixed on the behalf of the user.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
100
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
101 JB asks:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
102 What interface should the learner expose in order for the hyper-parameter to
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
103 be generic (work for many/most/all learners)
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
104
1227
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
105 NB: In the case of a K-fold hyper-learner, I would expect the user to
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
106 completely specify the hyper-parameters and the hyper-learner could just
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
107 blindly pass them along to the sub-learner. For more complex hyper-learners
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
108 like hyper-optimizer or hyper-grid we would require supported sub-learners
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
109 to define a function "get_hyperparam" that returns a
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
110 dict(name1: [default, range], name2: ...). These hyper-parameters are
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
111 supplied to the learner constructor.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
112
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
113 This K-fold learner, since it is generic, would work by launching multiple
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
114 experiments and would support doing so in parallel inside of a job (python MPI
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
115 ?) or by launching on the cluster multiple owned scripts that write results on
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
116 disk in the way specified by the K-fold learner.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
117
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
118 JB asks:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
119 This is not technically possible if the worker nodes and the master node do
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
120 not all share a filesystem. There is a soft requirement that the library
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
121 support this so that we can do job control from DIRO without messing around
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
122 with colosse, mammouth, condor, angel, etc. all separately.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
123
1227
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
124 NB: The hyper-learner would have to support launching jobs on remote servers
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
125 via ssh. Common functionality for this could of course be reused between
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
126 different hyper-learners.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
127
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
128 JB asks:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
129 The format used to communicate results from 'learner' jobs with the kfold loop
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
130 and with the stats collectors, and the experiment visualization code is not
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
131 obvious - any ideas how to handle this?
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
132
1227
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
133 NB: The DBN is responsible for saving/viewing results inside a DBN experiment.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
134 The hyper-learner controls DBN execution (even in a script on a remote
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
135 machine) and collects evaluation measurements after its dbn.predict call.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
136 For K-fold it would typically just save the evaluation distribution and
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
137 average in whatever way (internal convention) that can be transfered over ssh.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
138 The K-fold hyper-learner would only expose its train interface (no adapt,
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
139 predict) since it cannot always be decomposed in many steps depending on the
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
140 sublearner.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
141
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
142 The library would also have a DBN learner with flexible hyper-parameters that
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
143 control its detailed architecture.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
144
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
145 JB asks:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
146 What kind of building blocks should make this possible - how much flexibility
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
147 and what kinds are permitted?
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
148
1227
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
149 NB: Things like number of layers, hidden units and any optional parameters
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
150 that affect initialization or training (i.e. AE or RBM variant) that the DBN
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
151 developer can think of. The final user would have to specify those
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
152 hyper-parameters to the K-fold learner anyway.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
153
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
154 The interface of the provided dataset would have to conform to possible inputs
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
155 that the DBN module understands, i.e. by
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
156 default 2D numpy arrays. If more complex dataset needs arise, either subclass a
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
157 converter for the known format or add this functionality to the DBN learner
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
158 directly. Details of the DBN learner core would resemble the tutorials, would
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
159 typically be included in one straigthforward code file and could potentially use
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
160 "Theano-formula"-based learners as intermediate steps.
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
161
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
162 JB asks:
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
163
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
164 One of the troubles with straightforward code is that it is neither easy to
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
165 stop and start (as in long-running jobs) nor control via a hyper-parameter
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
166 optimizer. So I don't think code in the style of the curren tutorials is very
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
167 useful in the library.
1227
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
168
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
169 NB: I could see how we could require all learners to define stop and restart
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
170 methods so they would be responsible to save and restore themselves.
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
171 A hyper-learner's stop and restart method would in addition call recursively
d9f93923765f answers
boulanni <nicolas_boulanger@hotmail.com>
parents: 1225
diff changeset
172 its subleaners' stop and restart methods.
1225
dbac4bd107d8 added architecture_NB
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
173