Mercurial > pylearn
annotate doc/v2_planning/learner.txt @ 1139:9f0502f8c7a5
Example of the plugin architecture I had in mind
author | gdesjardins |
---|---|
date | Thu, 16 Sep 2010 13:27:17 -0400 |
parents | f082a6c0b008 |
children | 7a8dcf87d780 |
rev | line source |
---|---|
1041
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
1 |
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
2 Comittee: AB, PL, GM, IG, RP, NB, PV |
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
3 Leader: ? |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 Discussion of Function Specification for Learner Types |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
6 ====================================================== |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 In its most abstract form, a learner is an object with the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
9 following semantics: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 * A learner has named hyper-parameters that control how it learns (these can be viewed |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
12 as options of the constructor, or might be set directly by a user) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 * A learner also has an internal state that depends on what it has learned. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
16 * A learner reads and produces data, so the definition of learner is |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
17 intimately linked to the definition of dataset (and task). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
18 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
19 * A learner has one or more 'train' or 'adapt' functions by which |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
20 it is given a sample of data (typically either the whole training set, or |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
21 a mini-batch, which contains as a special case a single 'example'). Learners |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
22 interface with datasets in order to obtain data. These functions cause the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
23 learner to change its internal state and take advantage to some extent |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
24 of the data provided. The 'train' function should take charge of |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
25 completely exploiting the dataset, as specified per the hyper-parameters, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
26 so that it would typically be called only once. An 'adapt' function |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
27 is meant for learners that can operate in an 'online' setting where |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
28 data continually arrive and the control loop (when to stop) is to |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
29 be managed outside of it. For most intents and purposes, the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
30 'train' function could also handle the 'online' case by providing |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
31 the controlled iterations over the dataset (which would then be |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
32 seen as a stream of examples). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
33 * learner.train(dataset) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
34 * learner.adapt(data) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
35 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
36 * Different types of learners can then exploit their internal state |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
37 in order to perform various computations after training is completed, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
38 or in the middle of training, e.g., |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
39 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
40 * y=learner.predict(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
41 for learners that see (x,y) pairs during training and predict y given x, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
42 or for learners that see only x's and learn a transformation of it (i.e. feature extraction). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
43 Here and below, x and y are tensor-like objects whose first index iterates |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
44 over particular examples in a batch or minibatch of examples. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
45 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
46 * p=learner.probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
47 p=learner.log_probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
48 for learners that can estimate probability density or probability functions, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
49 note that example could be a pair (x,y) for learners that expect each example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
50 to represent such a pair. The second form is provided in case the example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
51 is high-dimensional and computations in the log-domain are numerically preferable. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
52 The first dimension of examples or of x and y is an index over a minibatch or a dataset. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
53 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
54 * p=learner.free_energy(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
55 for learners that can estimate a log unnormalized probability; the output has the same length as the input. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
56 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
57 * c=learner.costs(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
58 returns a matrix of costs (one row per example, i.e., again the output has the same length |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
59 as the input), the first column of which represents the cost whose expectation |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
60 we wish to minimize over new samples from the unknown underlying data distribution. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
61 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
62 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
63 Some learners may be able to handle x's and y's that contain missing values. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
64 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
65 * For convenience, some of these operations could be bundled, e.g. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
66 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
67 * [prediction,costs] = learner.predict_and_adapt((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
68 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
69 * Some learners could include in their internal state not only what they |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
70 have learned but some information about recently seen examples that conditions |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
71 the expected distribution of upcoming examples. In that case, they might |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
72 be used, e.g. in an online setting as follows: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
73 for (x,y) in data_stream: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
74 [prediction,costs]=learner.predict((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
75 accumulate_statistics(prediction,costs) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
76 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
77 * In some cases, each example is itself a (possibly variable-size) sequence |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
78 or other variable-size object (e.g. an image, or a video) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
79 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
80 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
81 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
82 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
83 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
84 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
85 |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
86 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
87 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
88 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
89 James's idea for Learner Interface |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
90 =================================== |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
91 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
92 Theory: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
93 ------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
94 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
95 Think about the unfolding of a learning algorithm as exploring a path in a vast |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
96 directed graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
97 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
98 There are some source nodes, which are potential initial conditions for the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
99 learning algorithm. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
100 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
101 At any node, there are a number of outgoing labeled edges that represent |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
102 distinct directions of exploration: like "allocate a model with N hidden units", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
103 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
104 iterations" or "refresh the GPU dataset memory with the next batch of data". |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
105 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
106 Not all nodes have the same outgoing edge labels. The dataset, model, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
107 optimization algorithm implementations may each have their various |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
108 hyper-parameters with various restrictions on what values they can take, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
109 when they can be changed. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
110 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
111 Every move in this graph incurs some storage and computational expense, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
112 explores the graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
113 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
114 Learners typically engage in goal-directed exploration of this graph - for |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
115 example to find the node with the best validation-set performance given a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
116 certain computational budget. We might often be interested in the best node |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
117 found. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
118 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
119 The predict(), log_probability(), free_energy() etc correspond to costs that we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
120 can measure at any particular node (at some computational expense) to see how we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
121 are doing in our exploration. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
122 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
123 Many semantically distinct components come into the definition of this graph: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
124 the model (e.g. DAA) the dataset (e.g. an online one), the inference and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
125 learning strategy. I'm not sure what to call this graph than an 'experiment |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
126 graph'... so I'll go with that for now. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
127 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
128 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
129 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
130 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
131 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
132 Use Cases |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
133 ---------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
134 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
135 Early stopping |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
136 ~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
137 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
138 Early stopping can be implemented as a learner that progresses along a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
139 particular kind of edge (e.g. "train more") until a stopping criterion (in terms |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
140 of a cost computed from nodes along the path) is met. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
141 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
142 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
143 Grid Search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
144 ~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
145 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
146 Grid search is a learner policy that can be implemented in an experiment graph |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
147 where all paths have the form: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
148 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
149 ( "set param 0 to X", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
150 "set param 1 to Y", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
151 ... , |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
152 "set param N to Z", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
153 adapt, |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
154 [early stop...], |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
155 test) |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
156 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
157 It would explore all paths of this form and then return the best node. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
158 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
159 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
160 Stagewise learning of DBNs combined with early stopping and grid search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
161 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
162 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
163 This would be a learner that is effective for experiment graphs that reflect the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
164 greedy-stagewise optimization of DBNs. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
165 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
166 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
167 Boosting |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
168 ~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
169 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
170 Given an ExperimentGraph that permits re-weighting of examples, it is |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
173 the picture and make a useful boosting implementation. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
174 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
175 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
176 Using External Hyper-Parameter Optimization Software |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
178 TODO: use-case - show how we could use the optimizer from |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
180 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
181 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
182 Implementation Details / API |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
183 ---------------------------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
184 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
185 Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
186 ~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
187 An object that allows us to explore the graph discussed above. Specifically, it represents |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
188 an explored node in that graph. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
189 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
190 def active_instructions() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
192 to handle. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
193 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
194 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
195 def copy(), deepcopy() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
196 """ Learners should be serializable """ |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
197 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
198 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
199 To make the implementation easier, I found it was helpful to introduce a string-valued |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
200 `fsa_state` member attribute and associate methods to these states. That made it |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
201 syntactically easy to build relatively complex finite-state transition graphs to describe |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
202 which instructions were active at which times in the life-cycle of a learner. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
203 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
204 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
205 Instruction |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
206 ~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
207 An object that represents a potential edge in the graph discussed above. It is an |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
208 operation that a learner can perform. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
209 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
210 arg_types |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
211 """a list of Type object (see below) indicating what args are required by execute""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
212 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
213 def execute(learner, args, kwargs): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
214 """ Perform some operation on the learner (follow an edge in the graph discussed above) |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
216 the graph along an edge. To have the old learner as well, it must be copied prior to |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
217 calling execute(). |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
218 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
219 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
220 def expense(learner, args, kwargs, resource_type='CPUtime'): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
221 """ Return an estimated cost of performing this instruction (calling execute), in time, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
222 space, number of computers, disk requierement, etc. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
223 """ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
224 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
225 Type |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
226 ~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
227 An object that describes a parameter domain for a call to Instruction.execute. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
229 `include` all legal arguments, and exclude as many illegal ones as possible. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
230 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
231 def includes(value): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
232 """return True if value is a legal argument""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
233 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
234 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
237 that automatic graph exploration algorithms can generate legal arguments with reasonable |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
238 efficiency. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
239 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
240 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
241 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
243 instances also introduce Proxy Instruction classes. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
244 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
248 more powerful instructions and hide simpler ones. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
249 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
250 It is less straightforward, but consistent with the design to implement a Learner that |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
251 encompasses job management. Such a learner would retain the semantics of the |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
254 etc.) Such a learner would replace synchronous instructions (return on completion) with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
255 asynchronous ones (return after scheduling) and the active instruction set would also change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
256 asynchronously, but neither of these things is inconsistent with the Learner API. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
257 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
258 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
259 TODO - Experiment API? |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
260 ~~~~~~~~~~~~~~~~~~~~~~ |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
261 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
262 I feel like something is missing from the API - and that is an interface to the graph structure |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
263 discussed above. The nodes in this graph are natural places to store meta-information for |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
265 itself. In other words, there is no API through which to attach information to nodes. It is |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
266 not good to say that the Learner instance *is* the node because (a) learner instances change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
269 over into other committees, so we should get their feedback about how to resolve |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
270 it. Maybe we need an 'Experiment' API to stand for this graph? |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
271 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
272 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
273 TODO: Validation & Monitoring Costs |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
274 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
275 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
276 Even if we do have the Experiment API as a structure to hang validation and |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
277 monitoring results, what should be the mechanism for extracting those results. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
278 The Learner API is not right because extracting a monitoring cost doesn't change |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
279 the model, doesn't change the legal instructions/edges etc. Maybe we should use |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
280 a similar mechanism to Instruction, called something like Measurement? Any node |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
281 / learner can report the list of instructions (for moving) and the list of |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
282 measurements (and the cost of computing them too) |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
283 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
284 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
285 TODO - Parameter Distributions |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
286 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
1055
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
287 |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
288 YB asks: it seems to me that what we really need from "Type" is not just |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
289 testing that a value is legal, but more practically a function that specifies the |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
290 prior distribution for the hyper-parameter, i.e., how to sample from it, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
291 and possibly some representation of it that could be used to infer |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
292 a posterior (such as an unnormalized log-density or log-probability). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
293 Having the min and max and default limits us to the uniform distribution, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
294 which may not always be appropriate. For example sometimes we'd like |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
295 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
296 For that reason, I think that "Type" is not a very good name. |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
297 How about "Prior" or "Density" or something like that? |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
298 |
1058
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
299 JB replies: I agree that being able to choose (and update) distributions over |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
300 these values is important. I don't think the Type structure is the right place |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
301 to handle it though. The challenge is to allow those distributions to change |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
302 for a variety of reasons - e.g. the sampling distribution on the capacity |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
303 variables is affected by the size of the dataset, it is also affected by |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
304 previous experience in general as well as experiments on that particular |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
305 dataset. I'm not sure that the 'Type' structure is right to deal with this. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
306 Also, even with a strategy for handling these distributions, I believe a simple |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
307 mechanism for rejecting insane values might be useful. |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
308 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
309 So how should we handle it? Hmmm... |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
310 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
311 |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
312 Comments |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
313 ~~~~~~~~ |
e342de3ae485
v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1055
diff
changeset
|
314 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
315 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm |
1045
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
316 interested to see how the learner interface is shaping up so I'll be keeping |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
317 an eye on this file) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
318 I'm wondering what's the benefit of such an API compared to simply defining a |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
319 new method for each instruction. It seems to me that typically, the 'execute' |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
320 method would end up being something like |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
321 if instruction == 'do_x': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
322 self.do_x(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
323 elif instruction == 'do_y': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
324 self.do_y(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
325 ... |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
326 so why not directly call do_x / do_y instead? |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
327 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
328 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
329 JB replies: I agree with you, and in the implementation of a Learner I suggest |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
330 using Python decorators to get the best of both worlds: |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
331 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
332 class NNet(Learner): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
333 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
334 ... |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
335 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
336 @Instruction.new(arg_types=(Float(min=-8, max=-1, default=-4),)) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
337 def set_log_lr(self, log_lr): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
338 self.lr.value = numpy.exp(log_lr) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
339 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
340 ... |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
341 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
342 The Learner base class can implement a instruction_set() that walks through the |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
343 methods of 'self' and pick out the ones that have corresponding instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
344 But anyone can call the method normally. The NNet class can also have methods |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
345 that are not instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
346 |
1053 | 347 OD replies: Ok thanks. I'm still unsure what is the end goal, but I'll keep |
348 watching while you guys work on it, and hopefully it'll become clearer for me ;) | |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
349 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
350 RP asks: James correct me if I'm wrong, but I think each instruction has a execute |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
351 command. The job of the learner is to traverse the graph and for each edge |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
352 that it decides to cross to call the execute of that edge. Maybe James has |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
353 something else in mind, but this was my understanding. |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
354 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
355 JB replies: close, but let me make a bit of a clarification. The job of a |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
356 Learner is simply to implement the API of a Learner - to list what edges are |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
357 available and to be able to cross them if asked. The code *using* the Learner |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
358 (client) decides which edges to cross. The client may also be a Learner, but |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
359 maybe not. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
360 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
361 |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
362 |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
363 Just another view/spin on the same idea (Razvan) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
364 ================================================ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
365 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
366 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
367 My idea is probably just a spin off from what James wrote. It is an extension |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
368 of what I send on the mailing list some time ago. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
369 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
370 Big Picture |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
371 ----------- |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
372 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
373 What do we care about ? |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
374 ~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
375 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
376 This is the list of the main points that I have in mind : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
377 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
378 * Re-usability |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
379 * Extensibility |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
380 * Simplicity or easily readable code ( connected to re-usability ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
381 * Modularity ( connected to extensibility ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
382 * Fast to write code ( - sort of comes out of simplicity) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
383 * Efficient code |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
384 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
385 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
386 Composition |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
387 ~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
388 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
389 To me this reads as code generated by composing pieces. Imagine this : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
390 you start of with something primitive that I will call a "variable", which |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
391 probably is a very unsuitable name. And then you compose those intial |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
392 "variables" or transform them through several "functions". Each such |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
393 "function" hides some logic, that you as the user don't care about. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
394 You can have low-level or micro "functions" and high-level or macro |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
395 "functions", where a high-level function is just a certain compositional |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
396 pattern of low-level "functions". There are several classes of "functions" |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
397 and "variables" that can be inter-changable. This is how modularity is |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
398 obtained, by chainging between functions from a certain class. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
399 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
400 Now when you want to research something, what you do is first select |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
401 the step you want to look into. If you are lucky you can re-write this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
402 step as certain decomposition of low-level transformations ( there can be |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
403 multiple such decompositions). If not you have to implement such a |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
404 decompositions acording to your needs. Pick the low-level transformations you want |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
405 to change and write new versions that implement your logic. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
406 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
407 I think the code will be easy to read, because it is just applying a fixed |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
408 set of transformations, one after the other. The one who writes the code can |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
409 decide how explicit he wants to write things by switching between high-level |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
410 and low-level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
411 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
412 I think the code this way is re-usable, because you can just take this chain |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
413 of transformation and replace the one you care about, without looking into |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
414 the rest. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
415 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
416 You get this fractal property of the code. Zooming in, you always get just |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
417 a set of functions applied to a set of variables. In the begining those might |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
418 not be there, and you would have to create new "low level" decompositions, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
419 maybe even new "variables" that get data between those decompositions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
420 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
421 The thing with variables here, is that I don't want this "functions" to have |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
422 a state. All the information is passed along through these variables. This |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
423 way understanding the graph is easy, debugging it is also easier ( then having |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
424 all these hidden states ..) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
425 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
426 Note that while doing so we might ( and I strongly think we should) create |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
427 a (symbolic) DAG of operations. ( this is where it becomes what James was saying). |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
428 In such a DAG the "variables" will the nodes and the functions will be edges. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
429 I think having a DAG is useful in many ways (all this are things that one |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
430 might think about implementing in a far future, I'm not proposing to implement |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
431 them unless we want to use them - like the reconstruction ): |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
432 * there exist the posibility of writing optimizations ( theano style ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
433 * there exist the posibility to add global view utility functions ( like |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
434 a reconstruction function for SdA - extremely low level here), or global |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
435 view diagnostic tools |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
436 * the posibility of creating a GUI ( where you just create the Graph by |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
437 picking transforms and variables from a list ) or working interactively |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
438 and then generating code that will reproduce the graph |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
439 * you can view the graph and different granularity levels to understand |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
440 things ( global diagnostics) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
441 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
442 We should have a taxonomy of possible classes of functions and possible |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
443 classes of variables, but those should not be exclusive. We can work at a high |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
444 level for now, and decompose those high level functions to lower level when |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
445 we need to. We can introduce new classes of functions or intermediate |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
446 variables between those low level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
447 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
448 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
449 Similarities with James' idea |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
450 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
451 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
452 As I said before, this is I think just another view on what James proposed. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
453 The learner in his case is the module that traverses the graph of this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
454 operations, which makes sense here as well. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
455 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
456 The 'execute' command in his api is just applying a function to some variables in |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
457 my case. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
458 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
459 The learner keeps track of the graph that is formed I think in both cases. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
460 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
461 His view is a bit more general. I see the graph as fully created by the user, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
462 and the learner just has to go from the start to the end. In his case the |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
463 traversal is conditioned on some policies. I think these ideas can be mixed / |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
464 united. What I would see in my case to have this functionality is something |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
465 similar to the lazy linker for Theano. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
466 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
467 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
468 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
469 JB asks: There is definitely a strong theme of graphs in both suggestions, |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
470 furthermore graphs that have heavy-duty nodes and light-weight edges. But I |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
471 don't necessarily think that we're proposing the same thing. One difference is |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
472 that the graph I talked about would be infinite in most cases of interest, so |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
473 it's not going to be representable by Theano's data structures (even with lazy |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
474 if). Another difference is that the graph I had in mind doesn't feel fractal - |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
475 it would be very common for a graph edge to be atomic. A proxy pattern, such as |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
476 in a hyper-learner would create a notion of being able to zoom in, but other |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
477 than that, i'm not sure what you mean. |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
478 |
1056
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
479 RP replies: I've been thinking about my idea a bit and yes, it might be |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
480 quite different from what James has in mind, though there are plently of common |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
481 elements. I might have exagerated a bit with the zooming in, so in some cases |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
482 you will end up with atomic edges, though my hope is that is not most of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
483 edges. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
484 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
485 I think I should go into mode details when answering this question because |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
486 I feel I have not explained things sufficiently clear. Note, in many places |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
487 I replaced the word "function" by "transform". |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
488 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
489 Think of the learner as an object that traverses a DAG of steps created by the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
490 user. On this DAG the learner can potentially do a lot of cool stuff, but we |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
491 won't care about that for now. The DAG can be infinite in principle, and what |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
492 the learner does is just to go on the path described by the user ( and here |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
493 described is not through heuristics like in James case, but by giving the list |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
494 of edges it needs to follow). A potential cool thing the learner can do is to |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
495 regard the path given by the user as a suggestion ( or some form of heuristic) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
496 and try to improve it. This would be much closer to what James has in mind, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
497 and I definetely think is a cool way to go about it. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
498 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
499 Now this path in the graph is given by the user by composing subgraphs or |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
500 adding nodes to the graph. Or (expressing this in a more simple way) by applying |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
501 functions to variables. Any such function will introduce an edge ( or a subgraph) that |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
502 will connect the vertices corresponding to the input variables to the vertices |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
503 corresponding to the output variables. The variables store the state of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
504 learner. These functions are state-less, I think if you would give them states |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
505 you will make this approach really ugly (I might be wrong). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
506 The variables would contain informations required by the function, like |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
507 number of layers, on how many cores to run, cluster configurations, and so on. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
508 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
509 Now about the zooming part, that James asked. I might have exagerated a bit, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
510 is not that you can zoom in on any part infinitely. You will end up with |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
511 things that are atomic. The idea is that any such "transformation" or edge |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
512 has the potential to be split up in several "transformations". This offers |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
513 (in my view) a way of solving the time constraints of our project. We can |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
514 start by difining a coarse division in segments. For now we can have |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
515 a structure transform that makes a list of parameters into a deep |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
516 network of some type, then a learner transform that adds SGD + pre-training |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
517 on top of network, and then early stopper on top of that, and then a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
518 run_on_cluster on that.We would probably want something more finely grained |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
519 even from the start .. this is just to prove my point. When any of us |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
520 starts experimenting with a certain sub-step of this process ( like the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
521 structure) we will split that transform into several ( like ones that create |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
522 a layer and so on) that make sense for that case, and then start working on |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
523 the low level transform that we cares ( like the layer) introducing new |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
524 versions of it. I think we can not find a universal split that will cover |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
525 all of our cases, so I think we should allow different such splits. The one |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
526 who researches should look at what low-level transforms are available and use |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
527 those if they make sense, if not he would have to create a different split. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
528 Creating a different split might involve a lot of work and taking care of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
529 several issues so it should be done with care. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
530 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
531 I'll give an example from where I started thinking this way. Let say we want |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
532 to do the SdA with auxiliary inputs that encourages separation of the features |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
533 in the hidden layer that Yoshua was saying ( I had an attempt |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
534 at it some time ago for speech but I never eneded up finishing that project). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
535 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
536 You start up with something like : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
537 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
538 learner = Learner() |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
539 # This will create the learner that will traverse our graph. We might |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
540 # want it to be a function ``execute``, I just randomly picked this option. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
541 #I have no preference of this detail for now .. this is mostly work in progress |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
542 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
543 data = someSpeechData(path = 'some path') |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
544 # This is such a transform that will generate from the string representing the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
545 # path a dataset variable ( that will contain all informations you need to |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
546 # access data). This will probably be the object the datasets comittee will |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
547 # provide. Note, you might need to provide more information then the path, but |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
548 # you can easily see how to do that. All these stuff start from simple |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
549 # variables like path, batch size and so on and return a complex heavy duty |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
550 # variable (node). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
551 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
552 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
553 model = earlyStopping(pretrain(SdA(layers = [524, 500, 500,27], noise = [0.1,0.1]),data, epochs = 10), data) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
554 # This is a composition of two transforms. The SdA transform starts from the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
555 # info about layers and corruption /noise for each layer and construct a SdA. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
556 # This is a high level transform, so it will take care of defining all |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
557 # details, like pre-training, defining the cost and so on. Note that maybe it will |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
558 # require some more parameters .. you can assume that for anything else there |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
559 # is a default value that the SdA will use. earlyStopping is yet another |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
560 # transform that takes a model ( that we know how to train ) and some data, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
561 # and does early stoppign on it. For bravity I did not provide all the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
562 # information required like patience and so on. The SdA only knows how to do a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
563 # step of training. Same holds for pretrain. It will loop over the layers of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
564 # SdA and will train each one. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
565 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
566 steps = cluster(model, getPropertiesAndRanges(model), n_jobs = 20, cluster_info = getClusterInfo()) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
567 # This will lunch the wanted jobs. getPropertiesAndRanges will get from a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
568 # model all knobs that need to be turn, and their ranges and will uniformly |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
569 # sample from them in each jobs. getCluterInfo will return a variable |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
570 # containing informations about the cluster ( I added this for simplicity, it |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
571 # should probably be replaced with something like username, password, |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
572 # clusterpath or whatever). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
573 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
574 learner.execute(steps) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
575 # As an option, each of this output variables could contain the entire graph |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
576 # until that point. We could also have this in a different way .. this is |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
577 # adhoc at the moment |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
578 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
579 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
580 Now this is a coarse vanila SdA which is not what we wanted. We do not have a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
581 way of incorporating our auxiliary information in this. So what we have to do |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
582 is split/change the SdA transform. We would re-write it as : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
583 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
584 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
585 arch = SdA(layers = [524, 500, 500, 27], noise = [0.1,0.1]) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
586 model = earlyStopping(pretrain(arch,data,epochs = 10) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
587 ... |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
588 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
589 And then re-write things like : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
590 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
591 arch = SGD( cross_entropy( logreg( DAAlayer( [DAAlayer([524,500],0.1),500],0.1)))) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
592 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
593 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
594 We would re-write the DAAlayer as : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
595 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
596 layer0 = DAAlayer([524,500],0.1) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
597 layer1 = cross_entropy(reconstruct( tanh(dotW_b( layer0,500)),noise = 0.1)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
598 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
599 At this point of detail, we can start inserting our new stuff in as follows : |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
600 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
601 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
602 input = empty_layer(600) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
603 # empty layer is a wrapper ; if I would to write dotW_b(200,500) which means |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
604 # go from a layer of 200 units to a one of 500 by multiplying with a matrix |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
605 # and adding a bias, what I would mean is dotW_b( empty_layer(200), 500). |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
606 # an implementation of empty_layer could be just theano.tensor.vector() |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
607 # where we add the size tag ( we will need it later) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
608 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
609 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
610 hidden0_mfcc = dotW_b(input[0:524],100) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
611 hidden0_noise = dotW_b(input[0:560],50) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
612 hidden0_speakerID = dotW_b(join(input[0:524], input[560:600]),50) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
613 hidden0 = tanh(join( layer0_mfcc, layer0_noise, layer0_speakerID)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
614 layer0 = cross_entropy( reconstruct( hidden0, noise = 0.1)) |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
615 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
616 and so on. Hopefully you got what I mean by spliting a transform, or zooming |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
617 in. When doing all this we did not change anything about the early stopping or |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
618 lunching jobs on the cluster. In the same manner, if one would like to look |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
619 into how jobs are send to the cluster, it could just expand that part. Note |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
620 that if we wanted to do something else we might have split the DAA |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
621 differently. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
622 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
623 The key of this approach is to identify such low level units that can be |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
624 shared by 90% of our architectures, and the splits that make most sense |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
625 from a functional point of view that will cover the main points where people |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
626 will like to change things. This will ensure that almost all the time we have |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
627 the wanted low-level bits that we want to write our code into, and most of the |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
628 time we will only work on one of that bit. There will definetely be cases when |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
629 whatever we have will not be sufficient or convinient. In that case some |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
630 effort has to be invested by the user to create a different decomposition of |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
631 the problem in the elements he need. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
632 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
633 I've been thinking about this a bit, and it definetely works in for deep |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
634 networks and theano ( the approach was inspired by theano). From what James |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
635 said, I think that other stuff might be possible to incorporate, at least as |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
636 atomic transforms if not in any other way. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
637 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
638 TODO: one has to give some thought of this low-level transform, to find a |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
639 suitable set of them ( and variables) so that would end up most of the time |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
640 re-using things and not creating new things. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
641 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
642 NOTES: there are some other implementation details missing of what this state |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
643 variables should contain. I did not want to clutter this with what tricks |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
644 could be used to get this transparent interface. I have a few of them in mind |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
645 though.. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
646 there is a lot of hardcoded values in this example. Usually each transform |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
647 that takes an input should "know" which of these inputs are tunable and mark |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
648 them as such. The order of the input in this example is important as well. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
649 This can be easily solved at the expense of a few more lines of code that |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
650 I did not want to write. |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
651 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
652 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
653 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
654 |
19033ef1636d
some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1055
diff
changeset
|
655 |