Mercurial > pylearn
annotate doc/v2_planning/learner.txt @ 1474:a57f4839a9d8
merge
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 18 May 2011 10:52:42 -0400 |
parents | 0e12ea6ba661 |
children |
rev | line source |
---|---|
1041
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
1 |
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
2 Comittee: AB, PL, GM, IG, RP, NB, PV |
1167
7a8dcf87d780
Rename learn_meeting.py to API_learner.txt
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
1059
diff
changeset
|
3 Leader: PL |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 Discussion of Function Specification for Learner Types |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
6 ====================================================== |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 In its most abstract form, a learner is an object with the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
9 following semantics: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 * A learner has named hyper-parameters that control how it learns (these can be viewed |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
12 as options of the constructor, or might be set directly by a user) |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 * A learner also has an internal state that depends on what it has learned. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
16 * A learner reads and produces data, so the definition of learner is |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
17 intimately linked to the definition of dataset (and task). |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
18 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
19 * A learner has one or more 'train' or 'adapt' functions by which |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
20 it is given a sample of data (typically either the whole training set, or |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
21 a mini-batch, which contains as a special case a single 'example'). Learners |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
22 interface with datasets in order to obtain data. These functions cause the |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
23 learner to change its internal state and take advantage to some extent |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
24 of the data provided. The 'train' function should take charge of |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
25 completely exploiting the dataset, as specified per the hyper-parameters, |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
26 so that it would typically be called only once. An 'adapt' function |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
27 is meant for learners that can operate in an 'online' setting where |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
28 data continually arrive and the control loop (when to stop) is to |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
29 be managed outside of it. For most intents and purposes, the |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
30 'train' function could also handle the 'online' case by providing |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
31 the controlled iterations over the dataset (which would then be |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
32 seen as a stream of examples). |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
33 |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
34 * learner.train(dataset) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
35 * learner.adapt(data) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
36 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
37 * Different types of learners can then exploit their internal state |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
38 in order to perform various computations after training is completed, |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
39 or in the middle of training, e.g., |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
40 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
41 * y=learner.predict(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
42 for learners that see (x,y) pairs during training and predict y given x, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
43 or for learners that see only x's and learn a transformation of it (i.e. feature extraction). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
44 Here and below, x and y are tensor-like objects whose first index iterates |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
45 over particular examples in a batch or minibatch of examples. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
46 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
47 * p=learner.probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
48 p=learner.log_probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
49 for learners that can estimate probability density or probability functions, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
50 note that example could be a pair (x,y) for learners that expect each example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
51 to represent such a pair. The second form is provided in case the example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
52 is high-dimensional and computations in the log-domain are numerically preferable. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
53 The first dimension of examples or of x and y is an index over a minibatch or a dataset. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
54 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
55 * p=learner.free_energy(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
56 for learners that can estimate a log unnormalized probability; the output has the same length as the input. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
57 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
58 * c=learner.costs(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
59 returns a matrix of costs (one row per example, i.e., again the output has the same length |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
60 as the input), the first column of which represents the cost whose expectation |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
61 we wish to minimize over new samples from the unknown underlying data distribution. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
62 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
63 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
64 Some learners may be able to handle x's and y's that contain missing values. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
65 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
66 * For convenience, some of these operations could be bundled, e.g. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
67 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
68 * [prediction,costs] = learner.predict_and_adapt((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
69 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
70 * Some learners could include in their internal state not only what they |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
71 have learned but some information about recently seen examples that conditions |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
72 the expected distribution of upcoming examples. In that case, they might |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
73 be used, e.g. in an online setting as follows: |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
74 |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
75 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
76 |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
77 for (x,y) in data_stream: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
78 [prediction,costs]=learner.predict((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
79 accumulate_statistics(prediction,costs) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
80 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
81 * In some cases, each example is itself a (possibly variable-size) sequence |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
82 or other variable-size object (e.g. an image, or a video) |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
83 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
84 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
85 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
86 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
87 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
88 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
89 |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
90 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
91 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
92 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
93 James's idea for Learner Interface |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
94 =================================== |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
95 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
96 Theory: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
97 ------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
98 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
99 Think about the unfolding of a learning algorithm as exploring a path in a vast |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
100 directed graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
101 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
102 There are some source nodes, which are potential initial conditions for the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
103 learning algorithm. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
104 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
105 At any node, there are a number of outgoing labeled edges that represent |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
106 distinct directions of exploration: like "allocate a model with N hidden units", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
107 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
108 iterations" or "refresh the GPU dataset memory with the next batch of data". |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
109 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
110 Not all nodes have the same outgoing edge labels. The dataset, model, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
111 optimization algorithm implementations may each have their various |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
112 hyper-parameters with various restrictions on what values they can take, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
113 when they can be changed. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
114 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
115 Every move in this graph incurs some storage and computational expense, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
116 explores the graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
117 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
118 Learners typically engage in goal-directed exploration of this graph - for |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
119 example to find the node with the best validation-set performance given a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
120 certain computational budget. We might often be interested in the best node |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
121 found. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
122 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
123 The predict(), log_probability(), free_energy() etc correspond to costs that we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
124 can measure at any particular node (at some computational expense) to see how we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
125 are doing in our exploration. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
126 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
127 Many semantically distinct components come into the definition of this graph: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
128 the model (e.g. DAA) the dataset (e.g. an online one), the inference and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
129 learning strategy. I'm not sure what to call this graph than an 'experiment |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
130 graph'... so I'll go with that for now. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
131 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
132 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
133 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
134 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
135 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
136 Use Cases |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
137 ---------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
138 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
139 Early stopping |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
140 ~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
141 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
142 Early stopping can be implemented as a learner that progresses along a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
143 particular kind of edge (e.g. "train more") until a stopping criterion (in terms |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
144 of a cost computed from nodes along the path) is met. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
145 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
146 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
147 Grid Search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
148 ~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
149 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
150 Grid search is a learner policy that can be implemented in an experiment graph |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
151 where all paths have the form: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
152 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
153 ( "set param 0 to X", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
154 "set param 1 to Y", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
155 ... , |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
156 "set param N to Z", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
157 adapt, |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
158 [early stop...], |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
159 test) |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
160 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
161 It would explore all paths of this form and then return the best node. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
162 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
163 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
164 Stagewise learning of DBNs combined with early stopping and grid search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
165 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
166 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
167 This would be a learner that is effective for experiment graphs that reflect the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
168 greedy-stagewise optimization of DBNs. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
169 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
170 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
171 Boosting |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
172 ~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
173 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
174 Given an ExperimentGraph that permits re-weighting of examples, it is |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
175 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
176 A meta-meta-ExperimentGraph around that that does early-stopping would complete |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
177 the picture and make a useful boosting implementation. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
178 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
179 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
180 Using External Hyper-Parameter Optimization Software |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
181 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
182 TODO: use-case - show how we could use the optimizer from |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
183 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
184 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
185 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
186 Implementation Details / API |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
187 ---------------------------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
188 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
189 Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
190 ~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
191 An object that allows us to explore the graph discussed above. Specifically, it represents |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
192 an explored node in that graph. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
193 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
194 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
195 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
196 def active_instructions() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
197 """ Return a list/set of Instruction instances (see below) that the Learner is prepared |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
198 to handle. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
199 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
200 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
201 def copy(), deepcopy() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
202 """ Learners should be serializable """ |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
203 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
204 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
205 To make the implementation easier, I found it was helpful to introduce a string-valued |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
206 `fsa_state` member attribute and associate methods to these states. That made it |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
207 syntactically easy to build relatively complex finite-state transition graphs to describe |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
208 which instructions were active at which times in the life-cycle of a learner. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
209 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
210 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
211 Instruction |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
212 ~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
213 An object that represents a potential edge in the graph discussed above. It is an |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
214 operation that a learner can perform. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
215 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
216 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
217 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
218 arg_types |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
219 """a list of Type object (see below) indicating what args are required by execute""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
220 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
221 def execute(learner, args, kwargs): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
222 """ Perform some operation on the learner (follow an edge in the graph discussed above) |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
223 and modify the learner in-place. Calling execute 'moves' the learner from one node in |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
224 the graph along an edge. To have the old learner as well, it must be copied prior to |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
225 calling execute(). |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
226 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
227 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
228 def expense(learner, args, kwargs, resource_type='CPUtime'): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
229 """ Return an estimated cost of performing this instruction (calling execute), in time, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
230 space, number of computers, disk requierement, etc. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
231 """ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
232 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
233 Type |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
234 ~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
235 An object that describes a parameter domain for a call to Instruction.execute. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
236 It is not necessary that a Type specifies exactly which arguments are legal, but it should |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
237 `include` all legal arguments, and exclude as many illegal ones as possible. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
238 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
239 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
240 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
241 def includes(value): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
242 """return True if value is a legal argument""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
243 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
244 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
245 To make things a bit more practical, there are some Type subclasses like Int, Float, Str, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
246 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
247 that automatic graph exploration algorithms can generate legal arguments with reasonable |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
248 efficiency. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
249 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
250 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
251 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
252 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
253 instances also introduce Proxy Instruction classes. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
254 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
255 For example, it is straightforward to implement a hyper-learner by implementing a Learner with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
256 another learner (sub-learner) as a member attribute. The hyper-learner makes some |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
257 modifications to the instruction_set() return value of the sub-learner, typically to introduce |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
258 more powerful instructions and hide simpler ones. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
259 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
260 It is less straightforward, but consistent with the design to implement a Learner that |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
261 encompasses job management. Such a learner would retain the semantics of the |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
262 instruction_set of the sub-learner, but would replace the Instruction objects themselves with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
263 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
264 etc.) Such a learner would replace synchronous instructions (return on completion) with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
265 asynchronous ones (return after scheduling) and the active instruction set would also change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
266 asynchronously, but neither of these things is inconsistent with the Learner API. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
267 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
268 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
269 TODO - Experiment API? |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
270 ~~~~~~~~~~~~~~~~~~~~~~ |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
271 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
272 I feel like something is missing from the API - and that is an interface to the graph structure |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
273 discussed above. The nodes in this graph are natural places to store meta-information for |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
274 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
275 itself. In other words, there is no API through which to attach information to nodes. It is |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
276 not good to say that the Learner instance *is* the node because (a) learner instances change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
277 during graph exploration and (b) learner instances are big, and we don't want to have to keep a |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
278 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
279 over into other committees, so we should get their feedback about how to resolve |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
280 it. Maybe we need an 'Experiment' API to stand for this graph? |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
281 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
282 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
283 TODO: Validation & Monitoring Costs |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
284 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
285 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
286 Even if we do have the Experiment API as a structure to hang validation and |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
287 monitoring results, what should be the mechanism for extracting those results. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
288 The Learner API is not right because extracting a monitoring cost doesn't change |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
289 the model, doesn't change the legal instructions/edges etc. Maybe we should use |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
290 a similar mechanism to Instruction, called something like Measurement? Any node |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
291 / learner can report the list of instructions (for moving) and the list of |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
292 measurements (and the cost of computing them too) |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
293 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
294 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
295 TODO - Parameter Distributions |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
296 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
1055
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
297 |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
298 YB asks: it seems to me that what we really need from "Type" is not just |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
299 testing that a value is legal, but more practically a function that specifies the |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
300 prior distribution for the hyper-parameter, i.e., how to sample from it, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
301 and possibly some representation of it that could be used to infer |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
302 a posterior (such as an unnormalized log-density or log-probability). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
303 Having the min and max and default limits us to the uniform distribution, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
304 which may not always be appropriate. For example sometimes we'd like |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
305 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
306 For that reason, I think that "Type" is not a very good name. |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
307 How about "Prior" or "Density" or something like that? |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
308 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
309 JB replies: I agree that being able to choose (and update) distributions over |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
310 these values is important. I don't think the Type structure is the right place |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
311 to handle it though. The challenge is to allow those distributions to change |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
312 for a variety of reasons - e.g. the sampling distribution on the capacity |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
313 variables is affected by the size of the dataset, it is also affected by |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
314 previous experience in general as well as experiments on that particular |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
315 dataset. I'm not sure that the 'Type' structure is right to deal with this. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
316 Also, even with a strategy for handling these distributions, I believe a simple |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
317 mechanism for rejecting insane values might be useful. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
318 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
319 So how should we handle it? Hmmm... |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
320 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
321 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
322 Comments |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
323 ~~~~~~~~ |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
324 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
325 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm |
1045
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
326 interested to see how the learner interface is shaping up so I'll be keeping |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
327 an eye on this file) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
328 I'm wondering what's the benefit of such an API compared to simply defining a |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
329 new method for each instruction. It seems to me that typically, the 'execute' |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
330 method would end up being something like |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
331 |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
332 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
333 |
1045
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
334 if instruction == 'do_x': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
335 self.do_x(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
336 elif instruction == 'do_y': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
337 self.do_y(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
338 ... |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
339 |
1045
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
340 so why not directly call do_x / do_y instead? |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
341 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
342 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
343 JB replies: I agree with you, and in the implementation of a Learner I suggest |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
344 using Python decorators to get the best of both worlds: |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
345 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
346 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
347 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
348 class NNet(Learner): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
349 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
350 ... |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
351 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
352 @Instruction.new(arg_types=(Float(min=-8, max=-1, default=-4),)) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
353 def set_log_lr(self, log_lr): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
354 self.lr.value = numpy.exp(log_lr) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
355 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
356 ... |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
357 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
358 The Learner base class can implement a instruction_set() that walks through the |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
359 methods of 'self' and pick out the ones that have corresponding instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
360 But anyone can call the method normally. The NNet class can also have methods |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
361 that are not instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
362 |
1053 | 363 OD replies: Ok thanks. I'm still unsure what is the end goal, but I'll keep |
364 watching while you guys work on it, and hopefully it'll become clearer for me ;) | |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
365 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
366 RP asks: James correct me if I'm wrong, but I think each instruction has a execute |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
367 command. The job of the learner is to traverse the graph and for each edge |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
368 that it decides to cross to call the execute of that edge. Maybe James has |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
369 something else in mind, but this was my understanding. |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
370 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
371 JB replies: close, but let me make a bit of a clarification. The job of a |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
372 Learner is simply to implement the API of a Learner - to list what edges are |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
373 available and to be able to cross them if asked. The code *using* the Learner |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
374 (client) decides which edges to cross. The client may also be a Learner, but |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
375 maybe not. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
376 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
377 |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
378 |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
379 Just another view/spin on the same idea (Razvan) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
380 ================================================ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
381 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
382 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
383 My idea is probably just a spin off from what James wrote. It is an extension |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
384 of what I send on the mailing list some time ago. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
385 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
386 Big Picture |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
387 ----------- |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
388 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
389 What do we care about ? |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
390 ~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
391 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
392 This is the list of the main points that I have in mind : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
393 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
394 * Re-usability |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
395 * Extensibility |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
396 * Simplicity or easily readable code ( connected to re-usability ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
397 * Modularity ( connected to extensibility ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
398 * Fast to write code ( - sort of comes out of simplicity) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
399 * Efficient code |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
400 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
401 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
402 Composition |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
403 ~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
404 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
405 To me this reads as code generated by composing pieces. Imagine this : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
406 you start of with something primitive that I will call a "variable", which |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
407 probably is a very unsuitable name. And then you compose those intial |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
408 "variables" or transform them through several "functions". Each such |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
409 "function" hides some logic, that you as the user don't care about. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
410 You can have low-level or micro "functions" and high-level or macro |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
411 "functions", where a high-level function is just a certain compositional |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
412 pattern of low-level "functions". There are several classes of "functions" |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
413 and "variables" that can be inter-changable. This is how modularity is |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
414 obtained, by chainging between functions from a certain class. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
415 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
416 Now when you want to research something, what you do is first select |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
417 the step you want to look into. If you are lucky you can re-write this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
418 step as certain decomposition of low-level transformations ( there can be |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
419 multiple such decompositions). If not you have to implement such a |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
420 decompositions acording to your needs. Pick the low-level transformations you want |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
421 to change and write new versions that implement your logic. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
422 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
423 I think the code will be easy to read, because it is just applying a fixed |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
424 set of transformations, one after the other. The one who writes the code can |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
425 decide how explicit he wants to write things by switching between high-level |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
426 and low-level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
427 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
428 I think the code this way is re-usable, because you can just take this chain |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
429 of transformation and replace the one you care about, without looking into |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
430 the rest. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
431 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
432 You get this fractal property of the code. Zooming in, you always get just |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
433 a set of functions applied to a set of variables. In the begining those might |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
434 not be there, and you would have to create new "low level" decompositions, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
435 maybe even new "variables" that get data between those decompositions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
436 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
437 The thing with variables here, is that I don't want this "functions" to have |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
438 a state. All the information is passed along through these variables. This |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
439 way understanding the graph is easy, debugging it is also easier ( then having |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
440 all these hidden states ..) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
441 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
442 Note that while doing so we might ( and I strongly think we should) create |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
443 a (symbolic) DAG of operations. ( this is where it becomes what James was saying). |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
444 In such a DAG the "variables" will the nodes and the functions will be edges. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
445 I think having a DAG is useful in many ways (all this are things that one |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
446 might think about implementing in a far future, I'm not proposing to implement |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
447 them unless we want to use them - like the reconstruction ): |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
448 |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
449 * there exist the posibility of writing optimizations ( theano style ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
450 * there exist the posibility to add global view utility functions ( like |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
451 a reconstruction function for SdA - extremely low level here), or global |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
452 view diagnostic tools |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
453 * the posibility of creating a GUI ( where you just create the Graph by |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
454 picking transforms and variables from a list ) or working interactively |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
455 and then generating code that will reproduce the graph |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
456 * you can view the graph and different granularity levels to understand |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1167
diff
changeset
|
457 things ( global diagnostics) |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
458 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
459 We should have a taxonomy of possible classes of functions and possible |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
460 classes of variables, but those should not be exclusive. We can work at a high |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
461 level for now, and decompose those high level functions to lower level when |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
462 we need to. We can introduce new classes of functions or intermediate |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
463 variables between those low level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
464 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
465 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
466 Similarities with James' idea |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
467 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
468 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
469 As I said before, this is I think just another view on what James proposed. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
470 The learner in his case is the module that traverses the graph of this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
471 operations, which makes sense here as well. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
472 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
473 The 'execute' command in his api is just applying a function to some variables in |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
474 my case. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
475 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
476 The learner keeps track of the graph that is formed I think in both cases. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
477 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
478 His view is a bit more general. I see the graph as fully created by the user, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
479 and the learner just has to go from the start to the end. In his case the |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
480 traversal is conditioned on some policies. I think these ideas can be mixed / |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
481 united. What I would see in my case to have this functionality is something |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
482 similar to the lazy linker for Theano. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
483 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
484 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
485 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
486 JB asks: There is definitely a strong theme of graphs in both suggestions, |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
487 furthermore graphs that have heavy-duty nodes and light-weight edges. But I |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
488 don't necessarily think that we're proposing the same thing. One difference is |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
489 that the graph I talked about would be infinite in most cases of interest, so |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
490 it's not going to be representable by Theano's data structures (even with lazy |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
491 if). Another difference is that the graph I had in mind doesn't feel fractal - |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
492 it would be very common for a graph edge to be atomic. A proxy pattern, such as |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
493 in a hyper-learner would create a notion of being able to zoom in, but other |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
494 than that, i'm not sure what you mean. |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
495 |
1056
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
496 RP replies: I've been thinking about my idea a bit and yes, it might be |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
497 quite different from what James has in mind, though there are plently of common |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
498 elements. I might have exagerated a bit with the zooming in, so in some cases |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
499 you will end up with atomic edges, though my hope is that is not most of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
500 edges. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
501 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
502 I think I should go into mode details when answering this question because |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
503 I feel I have not explained things sufficiently clear. Note, in many places |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
504 I replaced the word "function" by "transform". |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
505 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
506 Think of the learner as an object that traverses a DAG of steps created by the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
507 user. On this DAG the learner can potentially do a lot of cool stuff, but we |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
508 won't care about that for now. The DAG can be infinite in principle, and what |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
509 the learner does is just to go on the path described by the user ( and here |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
510 described is not through heuristics like in James case, but by giving the list |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
511 of edges it needs to follow). A potential cool thing the learner can do is to |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
512 regard the path given by the user as a suggestion ( or some form of heuristic) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
513 and try to improve it. This would be much closer to what James has in mind, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
514 and I definetely think is a cool way to go about it. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
515 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
516 Now this path in the graph is given by the user by composing subgraphs or |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
517 adding nodes to the graph. Or (expressing this in a more simple way) by applying |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
518 functions to variables. Any such function will introduce an edge ( or a subgraph) that |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
519 will connect the vertices corresponding to the input variables to the vertices |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
520 corresponding to the output variables. The variables store the state of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
521 learner. These functions are state-less, I think if you would give them states |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
522 you will make this approach really ugly (I might be wrong). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
523 The variables would contain informations required by the function, like |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
524 number of layers, on how many cores to run, cluster configurations, and so on. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
525 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
526 Now about the zooming part, that James asked. I might have exagerated a bit, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
527 is not that you can zoom in on any part infinitely. You will end up with |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
528 things that are atomic. The idea is that any such "transformation" or edge |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
529 has the potential to be split up in several "transformations". This offers |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
530 (in my view) a way of solving the time constraints of our project. We can |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
531 start by difining a coarse division in segments. For now we can have |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
532 a structure transform that makes a list of parameters into a deep |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
533 network of some type, then a learner transform that adds SGD + pre-training |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
534 on top of network, and then early stopper on top of that, and then a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
535 run_on_cluster on that.We would probably want something more finely grained |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
536 even from the start .. this is just to prove my point. When any of us |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
537 starts experimenting with a certain sub-step of this process ( like the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
538 structure) we will split that transform into several ( like ones that create |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
539 a layer and so on) that make sense for that case, and then start working on |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
540 the low level transform that we cares ( like the layer) introducing new |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
541 versions of it. I think we can not find a universal split that will cover |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
542 all of our cases, so I think we should allow different such splits. The one |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
543 who researches should look at what low-level transforms are available and use |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
544 those if they make sense, if not he would have to create a different split. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
545 Creating a different split might involve a lot of work and taking care of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
546 several issues so it should be done with care. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
547 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
548 I'll give an example from where I started thinking this way. Let say we want |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
549 to do the SdA with auxiliary inputs that encourages separation of the features |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
550 in the hidden layer that Yoshua was saying ( I had an attempt |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
551 at it some time ago for speech but I never eneded up finishing that project). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
552 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
553 You start up with something like : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
554 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
555 learner = Learner() |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
556 # This will create the learner that will traverse our graph. We might |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
557 # want it to be a function ``execute``, I just randomly picked this option. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
558 #I have no preference of this detail for now .. this is mostly work in progress |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
559 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
560 data = someSpeechData(path = 'some path') |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
561 # This is such a transform that will generate from the string representing the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
562 # path a dataset variable ( that will contain all informations you need to |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
563 # access data). This will probably be the object the datasets comittee will |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
564 # provide. Note, you might need to provide more information then the path, but |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
565 # you can easily see how to do that. All these stuff start from simple |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
566 # variables like path, batch size and so on and return a complex heavy duty |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
567 # variable (node). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
568 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
569 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
570 model = earlyStopping(pretrain(SdA(layers = [524, 500, 500,27], noise = [0.1,0.1]),data, epochs = 10), data) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
571 # This is a composition of two transforms. The SdA transform starts from the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
572 # info about layers and corruption /noise for each layer and construct a SdA. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
573 # This is a high level transform, so it will take care of defining all |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
574 # details, like pre-training, defining the cost and so on. Note that maybe it will |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
575 # require some more parameters .. you can assume that for anything else there |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
576 # is a default value that the SdA will use. earlyStopping is yet another |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
577 # transform that takes a model ( that we know how to train ) and some data, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
578 # and does early stoppign on it. For bravity I did not provide all the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
579 # information required like patience and so on. The SdA only knows how to do a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
580 # step of training. Same holds for pretrain. It will loop over the layers of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
581 # SdA and will train each one. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
582 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
583 steps = cluster(model, getPropertiesAndRanges(model), n_jobs = 20, cluster_info = getClusterInfo()) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
584 # This will lunch the wanted jobs. getPropertiesAndRanges will get from a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
585 # model all knobs that need to be turn, and their ranges and will uniformly |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
586 # sample from them in each jobs. getCluterInfo will return a variable |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
587 # containing informations about the cluster ( I added this for simplicity, it |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
588 # should probably be replaced with something like username, password, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
589 # clusterpath or whatever). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
590 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
591 learner.execute(steps) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
592 # As an option, each of this output variables could contain the entire graph |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
593 # until that point. We could also have this in a different way .. this is |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
594 # adhoc at the moment |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
595 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
596 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
597 Now this is a coarse vanila SdA which is not what we wanted. We do not have a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
598 way of incorporating our auxiliary information in this. So what we have to do |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
599 is split/change the SdA transform. We would re-write it as : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
600 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
601 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
602 arch = SdA(layers = [524, 500, 500, 27], noise = [0.1,0.1]) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
603 model = earlyStopping(pretrain(arch,data,epochs = 10) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
604 ... |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
605 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
606 And then re-write things like : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
607 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
608 arch = SGD( cross_entropy( logreg( DAAlayer( [DAAlayer([524,500],0.1),500],0.1)))) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
609 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
610 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
611 We would re-write the DAAlayer as : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
612 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
613 layer0 = DAAlayer([524,500],0.1) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
614 layer1 = cross_entropy(reconstruct( tanh(dotW_b( layer0,500)),noise = 0.1)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
615 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
616 At this point of detail, we can start inserting our new stuff in as follows : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
617 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
618 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
619 input = empty_layer(600) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
620 # empty layer is a wrapper ; if I would to write dotW_b(200,500) which means |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
621 # go from a layer of 200 units to a one of 500 by multiplying with a matrix |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
622 # and adding a bias, what I would mean is dotW_b( empty_layer(200), 500). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
623 # an implementation of empty_layer could be just theano.tensor.vector() |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
624 # where we add the size tag ( we will need it later) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
625 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
626 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
627 hidden0_mfcc = dotW_b(input[0:524],100) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
628 hidden0_noise = dotW_b(input[0:560],50) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
629 hidden0_speakerID = dotW_b(join(input[0:524], input[560:600]),50) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
630 hidden0 = tanh(join( layer0_mfcc, layer0_noise, layer0_speakerID)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
631 layer0 = cross_entropy( reconstruct( hidden0, noise = 0.1)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
632 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
633 and so on. Hopefully you got what I mean by spliting a transform, or zooming |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
634 in. When doing all this we did not change anything about the early stopping or |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
635 lunching jobs on the cluster. In the same manner, if one would like to look |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
636 into how jobs are send to the cluster, it could just expand that part. Note |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
637 that if we wanted to do something else we might have split the DAA |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
638 differently. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
639 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
640 The key of this approach is to identify such low level units that can be |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
641 shared by 90% of our architectures, and the splits that make most sense |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
642 from a functional point of view that will cover the main points where people |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
643 will like to change things. This will ensure that almost all the time we have |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
644 the wanted low-level bits that we want to write our code into, and most of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
645 time we will only work on one of that bit. There will definetely be cases when |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
646 whatever we have will not be sufficient or convinient. In that case some |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
647 effort has to be invested by the user to create a different decomposition of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
648 the problem in the elements he need. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
649 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
650 I've been thinking about this a bit, and it definetely works in for deep |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
651 networks and theano ( the approach was inspired by theano). From what James |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
652 said, I think that other stuff might be possible to incorporate, at least as |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
653 atomic transforms if not in any other way. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
654 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
655 TODO: one has to give some thought of this low-level transform, to find a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
656 suitable set of them ( and variables) so that would end up most of the time |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
657 re-using things and not creating new things. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
658 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
659 NOTES: there are some other implementation details missing of what this state |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
660 variables should contain. I did not want to clutter this with what tricks |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
661 could be used to get this transparent interface. I have a few of them in mind |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
662 though.. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
663 there is a lot of hardcoded values in this example. Usually each transform |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
664 that takes an input should "know" which of these inputs are tunable and mark |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
665 them as such. The order of the input in this example is important as well. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
666 This can be easily solved at the expense of a few more lines of code that |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
667 I did not want to write. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
668 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
669 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
670 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
671 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
672 |