Mercurial > pylearn
annotate doc/v2_planning/learner.txt @ 1057:baf1988db557
v2planning optimization - added API
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Thu, 09 Sep 2010 11:32:42 -0400 |
parents | bc3f7834db83 |
children | 19033ef1636d e342de3ae485 |
rev | line source |
---|---|
1041
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
1 |
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
2 Comittee: AB, PL, GM, IG, RP, NB, PV |
38cc6e075d9b
PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1038
diff
changeset
|
3 Leader: ? |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 Discussion of Function Specification for Learner Types |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
6 ====================================================== |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 In its most abstract form, a learner is an object with the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
9 following semantics: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 * A learner has named hyper-parameters that control how it learns (these can be viewed |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
12 as options of the constructor, or might be set directly by a user) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 * A learner also has an internal state that depends on what it has learned. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
16 * A learner reads and produces data, so the definition of learner is |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
17 intimately linked to the definition of dataset (and task). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
18 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
19 * A learner has one or more 'train' or 'adapt' functions by which |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
20 it is given a sample of data (typically either the whole training set, or |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
21 a mini-batch, which contains as a special case a single 'example'). Learners |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
22 interface with datasets in order to obtain data. These functions cause the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
23 learner to change its internal state and take advantage to some extent |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
24 of the data provided. The 'train' function should take charge of |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
25 completely exploiting the dataset, as specified per the hyper-parameters, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
26 so that it would typically be called only once. An 'adapt' function |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
27 is meant for learners that can operate in an 'online' setting where |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
28 data continually arrive and the control loop (when to stop) is to |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
29 be managed outside of it. For most intents and purposes, the |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
30 'train' function could also handle the 'online' case by providing |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
31 the controlled iterations over the dataset (which would then be |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
32 seen as a stream of examples). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
33 * learner.train(dataset) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
34 * learner.adapt(data) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
35 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
36 * Different types of learners can then exploit their internal state |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
37 in order to perform various computations after training is completed, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
38 or in the middle of training, e.g., |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
39 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
40 * y=learner.predict(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
41 for learners that see (x,y) pairs during training and predict y given x, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
42 or for learners that see only x's and learn a transformation of it (i.e. feature extraction). |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
43 Here and below, x and y are tensor-like objects whose first index iterates |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
44 over particular examples in a batch or minibatch of examples. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
45 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
46 * p=learner.probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
47 p=learner.log_probability(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
48 for learners that can estimate probability density or probability functions, |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
49 note that example could be a pair (x,y) for learners that expect each example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
50 to represent such a pair. The second form is provided in case the example |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
51 is high-dimensional and computations in the log-domain are numerically preferable. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
52 The first dimension of examples or of x and y is an index over a minibatch or a dataset. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
53 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
54 * p=learner.free_energy(x) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
55 for learners that can estimate a log unnormalized probability; the output has the same length as the input. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
56 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
57 * c=learner.costs(examples) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
58 returns a matrix of costs (one row per example, i.e., again the output has the same length |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
59 as the input), the first column of which represents the cost whose expectation |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
60 we wish to minimize over new samples from the unknown underlying data distribution. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
61 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
62 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
63 Some learners may be able to handle x's and y's that contain missing values. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
64 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
65 * For convenience, some of these operations could be bundled, e.g. |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
66 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
67 * [prediction,costs] = learner.predict_and_adapt((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
68 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
69 * Some learners could include in their internal state not only what they |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
70 have learned but some information about recently seen examples that conditions |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
71 the expected distribution of upcoming examples. In that case, they might |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
72 be used, e.g. in an online setting as follows: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
73 for (x,y) in data_stream: |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
74 [prediction,costs]=learner.predict((x,y)) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
75 accumulate_statistics(prediction,costs) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
76 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
77 * In some cases, each example is itself a (possibly variable-size) sequence |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
78 or other variable-size object (e.g. an image, or a video) |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
79 |
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
80 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
81 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
82 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
83 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
84 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
85 |
1002
f82093bf4405
adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
86 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
87 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
88 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
89 James's idea for Learner Interface |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
90 =================================== |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
91 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
92 Theory: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
93 ------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
94 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
95 Think about the unfolding of a learning algorithm as exploring a path in a vast |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
96 directed graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
97 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
98 There are some source nodes, which are potential initial conditions for the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
99 learning algorithm. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
100 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
101 At any node, there are a number of outgoing labeled edges that represent |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
102 distinct directions of exploration: like "allocate a model with N hidden units", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
103 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
104 iterations" or "refresh the GPU dataset memory with the next batch of data". |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
105 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
106 Not all nodes have the same outgoing edge labels. The dataset, model, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
107 optimization algorithm implementations may each have their various |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
108 hyper-parameters with various restrictions on what values they can take, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
109 when they can be changed. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
110 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
111 Every move in this graph incurs some storage and computational expense, and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
112 explores the graph. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
113 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
114 Learners typically engage in goal-directed exploration of this graph - for |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
115 example to find the node with the best validation-set performance given a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
116 certain computational budget. We might often be interested in the best node |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
117 found. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
118 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
119 The predict(), log_probability(), free_energy() etc correspond to costs that we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
120 can measure at any particular node (at some computational expense) to see how we |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
121 are doing in our exploration. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
122 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
123 Many semantically distinct components come into the definition of this graph: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
124 the model (e.g. DAA) the dataset (e.g. an online one), the inference and |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
125 learning strategy. I'm not sure what to call this graph than an 'experiment |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
126 graph'... so I'll go with that for now. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
127 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
128 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
129 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
130 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
131 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
132 Use Cases |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
133 ---------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
134 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
135 Early stopping |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
136 ~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
137 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
138 Early stopping can be implemented as a learner that progresses along a |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
139 particular kind of edge (e.g. "train more") until a stopping criterion (in terms |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
140 of a cost computed from nodes along the path) is met. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
141 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
142 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
143 Grid Search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
144 ~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
145 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
146 Grid search is a learner policy that can be implemented in an experiment graph |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
147 where all paths have the form: |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
148 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
149 ( "set param 0 to X", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
150 "set param 1 to Y", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
151 ... , |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
152 "set param N to Z", |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
153 adapt, |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
154 [early stop...], |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
155 test) |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
156 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
157 It would explore all paths of this form and then return the best node. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
158 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
159 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
160 Stagewise learning of DBNs combined with early stopping and grid search |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
161 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
162 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
163 This would be a learner that is effective for experiment graphs that reflect the |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
164 greedy-stagewise optimization of DBNs. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
165 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
166 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
167 Boosting |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
168 ~~~~~~~~ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
169 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
170 Given an ExperimentGraph that permits re-weighting of examples, it is |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
173 the picture and make a useful boosting implementation. |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
174 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
175 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
176 Using External Hyper-Parameter Optimization Software |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
178 TODO: use-case - show how we could use the optimizer from |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
180 |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
181 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
182 Implementation Details / API |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
183 ---------------------------- |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
184 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
185 Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
186 ~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
187 An object that allows us to explore the graph discussed above. Specifically, it represents |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
188 an explored node in that graph. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
189 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
190 def active_instructions() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
192 to handle. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
193 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
194 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
195 def copy(), deepcopy() |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
196 """ Learners should be serializable """ |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
197 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
198 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
199 To make the implementation easier, I found it was helpful to introduce a string-valued |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
200 `fsa_state` member attribute and associate methods to these states. That made it |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
201 syntactically easy to build relatively complex finite-state transition graphs to describe |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
202 which instructions were active at which times in the life-cycle of a learner. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
203 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
204 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
205 Instruction |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
206 ~~~~~~~~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
207 An object that represents a potential edge in the graph discussed above. It is an |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
208 operation that a learner can perform. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
209 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
210 arg_types |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
211 """a list of Type object (see below) indicating what args are required by execute""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
212 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
213 def execute(learner, args, kwargs): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
214 """ Perform some operation on the learner (follow an edge in the graph discussed above) |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
216 the graph along an edge. To have the old learner as well, it must be copied prior to |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
217 calling execute(). |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
218 """ |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
219 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
220 def expense(learner, args, kwargs, resource_type='CPUtime'): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
221 """ Return an estimated cost of performing this instruction (calling execute), in time, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
222 space, number of computers, disk requierement, etc. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
223 """ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
224 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
225 Type |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
226 ~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
227 An object that describes a parameter domain for a call to Instruction.execute. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
229 `include` all legal arguments, and exclude as many illegal ones as possible. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
230 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
231 def includes(value): |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
232 """return True if value is a legal argument""" |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
233 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
234 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
237 that automatic graph exploration algorithms can generate legal arguments with reasonable |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
238 efficiency. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
239 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
240 |
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
241 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
243 instances also introduce Proxy Instruction classes. |
1026
38f799f8b6cd
v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1002
diff
changeset
|
244 |
1043
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
248 more powerful instructions and hide simpler ones. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
249 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
250 It is less straightforward, but consistent with the design to implement a Learner that |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
251 encompasses job management. Such a learner would retain the semantics of the |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools, |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
254 etc.) Such a learner would replace synchronous instructions (return on completion) with |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
255 asynchronous ones (return after scheduling) and the active instruction set would also change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
256 asynchronously, but neither of these things is inconsistent with the Learner API. |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
257 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
258 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
259 TODO |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
260 ~~~~ |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
261 |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
262 I feel like something is missing from the API - and that is an interface to the graph structure |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
263 discussed above. The nodes in this graph are natural places to store meta-information for |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
265 itself. In other words, there is no API through which to attach information to nodes. It is |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
266 not good to say that the Learner instance *is* the node because (a) learner instances change |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills |
3f528656855b
v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1041
diff
changeset
|
269 over into other committees, so we should get their feedback about how to resolve it. |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
270 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
271 Comments |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
272 ~~~~~~~~ |
1055
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
273 |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
274 YB asks: it seems to me that what we really need from "Type" is not just |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
275 testing that a value is legal, but more practically a function that specifies the |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
276 prior distribution for the hyper-parameter, i.e., how to sample from it, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
277 and possibly some representation of it that could be used to infer |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
278 a posterior (such as an unnormalized log-density or log-probability). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
279 Having the min and max and default limits us to the uniform distribution, |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
280 which may not always be appropriate. For example sometimes we'd like |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
281 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers). |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
282 For that reason, I think that "Type" is not a very good name. |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
283 How about "Prior" or "Density" or something like that? |
bc3f7834db83
added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
1053
diff
changeset
|
284 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
285 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm |
1045
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
286 interested to see how the learner interface is shaping up so I'll be keeping |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
287 an eye on this file) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
288 I'm wondering what's the benefit of such an API compared to simply defining a |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
289 new method for each instruction. It seems to me that typically, the 'execute' |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
290 method would end up being something like |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
291 if instruction == 'do_x': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
292 self.do_x(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
293 elif instruction == 'do_y': |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
294 self.do_y(..) |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
295 ... |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
296 so why not directly call do_x / do_y instead? |
d57bdd9a9980
learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents:
1044
diff
changeset
|
297 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
298 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
299 JB replies: I agree with you, and in the implementation of a Learner I suggest |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
300 using Python decorators to get the best of both worlds: |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
301 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
302 class NNet(Learner): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
303 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
304 ... |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
305 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
306 @Instruction.new(arg_types=(Float(min=-8, max=-1, default=-4),)) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
307 def set_log_lr(self, log_lr): |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
308 self.lr.value = numpy.exp(log_lr) |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
309 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
310 ... |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
311 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
312 The Learner base class can implement a instruction_set() that walks through the |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
313 methods of 'self' and pick out the ones that have corresponding instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
314 But anyone can call the method normally. The NNet class can also have methods |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
315 that are not instructions. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
316 |
1053 | 317 OD replies: Ok thanks. I'm still unsure what is the end goal, but I'll keep |
318 watching while you guys work on it, and hopefully it'll become clearer for me ;) | |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
319 |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
320 RP asks: James correct me if I'm wrong, but I think each instruction has a execute |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
321 command. The job of the learner is to traverse the graph and for each edge |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
322 that it decides to cross to call the execute of that edge. Maybe James has |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
323 something else in mind, but this was my understanding. |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
324 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
325 JB replies: close, but let me make a bit of a clarification. The job of a |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
326 Learner is simply to implement the API of a Learner - to list what edges are |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
327 available and to be able to cross them if asked. The code *using* the Learner |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
328 (client) decides which edges to cross. The client may also be a Learner, but |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
329 maybe not. |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
330 |
1046
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
331 |
f1732269bce8
comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1045
diff
changeset
|
332 |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
333 Just another view/spin on the same idea (Razvan) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
334 ================================================ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
335 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
336 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
337 My idea is probably just a spin off from what James wrote. It is an extension |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
338 of what I send on the mailing list some time ago. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
339 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
340 Big Picture |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
341 ----------- |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
342 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
343 What do we care about ? |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
344 ~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
345 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
346 This is the list of the main points that I have in mind : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
347 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
348 * Re-usability |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
349 * Extensibility |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
350 * Simplicity or easily readable code ( connected to re-usability ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
351 * Modularity ( connected to extensibility ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
352 * Fast to write code ( - sort of comes out of simplicity) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
353 * Efficient code |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
354 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
355 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
356 Composition |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
357 ~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
358 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
359 To me this reads as code generated by composing pieces. Imagine this : |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
360 you start of with something primitive that I will call a "variable", which |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
361 probably is a very unsuitable name. And then you compose those intial |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
362 "variables" or transform them through several "functions". Each such |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
363 "function" hides some logic, that you as the user don't care about. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
364 You can have low-level or micro "functions" and high-level or macro |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
365 "functions", where a high-level function is just a certain compositional |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
366 pattern of low-level "functions". There are several classes of "functions" |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
367 and "variables" that can be inter-changable. This is how modularity is |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
368 obtained, by chainging between functions from a certain class. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
369 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
370 Now when you want to research something, what you do is first select |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
371 the step you want to look into. If you are lucky you can re-write this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
372 step as certain decomposition of low-level transformations ( there can be |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
373 multiple such decompositions). If not you have to implement such a |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
374 decompositions acording to your needs. Pick the low-level transformations you want |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
375 to change and write new versions that implement your logic. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
376 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
377 I think the code will be easy to read, because it is just applying a fixed |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
378 set of transformations, one after the other. The one who writes the code can |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
379 decide how explicit he wants to write things by switching between high-level |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
380 and low-level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
381 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
382 I think the code this way is re-usable, because you can just take this chain |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
383 of transformation and replace the one you care about, without looking into |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
384 the rest. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
385 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
386 You get this fractal property of the code. Zooming in, you always get just |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
387 a set of functions applied to a set of variables. In the begining those might |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
388 not be there, and you would have to create new "low level" decompositions, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
389 maybe even new "variables" that get data between those decompositions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
390 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
391 The thing with variables here, is that I don't want this "functions" to have |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
392 a state. All the information is passed along through these variables. This |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
393 way understanding the graph is easy, debugging it is also easier ( then having |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
394 all these hidden states ..) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
395 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
396 Note that while doing so we might ( and I strongly think we should) create |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
397 a (symbolic) DAG of operations. ( this is where it becomes what James was saying). |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
398 In such a DAG the "variables" will the nodes and the functions will be edges. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
399 I think having a DAG is useful in many ways (all this are things that one |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
400 might think about implementing in a far future, I'm not proposing to implement |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
401 them unless we want to use them - like the reconstruction ): |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
402 * there exist the posibility of writing optimizations ( theano style ) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
403 * there exist the posibility to add global view utility functions ( like |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
404 a reconstruction function for SdA - extremely low level here), or global |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
405 view diagnostic tools |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
406 * the posibility of creating a GUI ( where you just create the Graph by |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
407 picking transforms and variables from a list ) or working interactively |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
408 and then generating code that will reproduce the graph |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
409 * you can view the graph and different granularity levels to understand |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
410 things ( global diagnostics) |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
411 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
412 We should have a taxonomy of possible classes of functions and possible |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
413 classes of variables, but those should not be exclusive. We can work at a high |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
414 level for now, and decompose those high level functions to lower level when |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
415 we need to. We can introduce new classes of functions or intermediate |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
416 variables between those low level functions. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
417 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
418 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
419 Similarities with James' idea |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
420 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
421 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
422 As I said before, this is I think just another view on what James proposed. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
423 The learner in his case is the module that traverses the graph of this |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
424 operations, which makes sense here as well. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
425 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
426 The 'execute' command in his api is just applying a function to some variables in |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
427 my case. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
428 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
429 The learner keeps track of the graph that is formed I think in both cases. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
430 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
431 His view is a bit more general. I see the graph as fully created by the user, |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
432 and the learner just has to go from the start to the end. In his case the |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
433 traversal is conditioned on some policies. I think these ideas can be mixed / |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
434 united. What I would see in my case to have this functionality is something |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
435 similar to the lazy linker for Theano. |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
436 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
437 |
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
438 |
1052
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
439 JB asks: There is definitely a strong theme of graphs in both suggestions, |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
440 furthermore graphs that have heavy-duty nodes and light-weight edges. But I |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
441 don't necessarily think that we're proposing the same thing. One difference is |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
442 that the graph I talked about would be infinite in most cases of interest, so |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
443 it's not going to be representable by Theano's data structures (even with lazy |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
444 if). Another difference is that the graph I had in mind doesn't feel fractal - |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
445 it would be very common for a graph edge to be atomic. A proxy pattern, such as |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
446 in a hyper-learner would create a notion of being able to zoom in, but other |
84f62533e7a8
v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1046
diff
changeset
|
447 than that, i'm not sure what you mean. |
1044
3b1fd599bafd
my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1043
diff
changeset
|
448 |