annotate doc/v2_planning/learner.txt @ 1057:baf1988db557

v2planning optimization - added API
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 09 Sep 2010 11:32:42 -0400
parents bc3f7834db83
children 19033ef1636d e342de3ae485
rev   line source
1041
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
1
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
2 Comittee: AB, PL, GM, IG, RP, NB, PV
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
3 Leader: ?
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
4
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
5 Discussion of Function Specification for Learner Types
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
6 ======================================================
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
7
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
8 In its most abstract form, a learner is an object with the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
9 following semantics:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
10
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
11 * A learner has named hyper-parameters that control how it learns (these can be viewed
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
12 as options of the constructor, or might be set directly by a user)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
13
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
14 * A learner also has an internal state that depends on what it has learned.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
15
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
16 * A learner reads and produces data, so the definition of learner is
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
17 intimately linked to the definition of dataset (and task).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
18
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
19 * A learner has one or more 'train' or 'adapt' functions by which
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
20 it is given a sample of data (typically either the whole training set, or
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
21 a mini-batch, which contains as a special case a single 'example'). Learners
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
22 interface with datasets in order to obtain data. These functions cause the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
23 learner to change its internal state and take advantage to some extent
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
24 of the data provided. The 'train' function should take charge of
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
25 completely exploiting the dataset, as specified per the hyper-parameters,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
26 so that it would typically be called only once. An 'adapt' function
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
27 is meant for learners that can operate in an 'online' setting where
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
28 data continually arrive and the control loop (when to stop) is to
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
29 be managed outside of it. For most intents and purposes, the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
30 'train' function could also handle the 'online' case by providing
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
31 the controlled iterations over the dataset (which would then be
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
32 seen as a stream of examples).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
33 * learner.train(dataset)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
34 * learner.adapt(data)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
35
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
36 * Different types of learners can then exploit their internal state
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
37 in order to perform various computations after training is completed,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
38 or in the middle of training, e.g.,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
39
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
40 * y=learner.predict(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
41 for learners that see (x,y) pairs during training and predict y given x,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
42 or for learners that see only x's and learn a transformation of it (i.e. feature extraction).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
43 Here and below, x and y are tensor-like objects whose first index iterates
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
44 over particular examples in a batch or minibatch of examples.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
45
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
46 * p=learner.probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
47 p=learner.log_probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
48 for learners that can estimate probability density or probability functions,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
49 note that example could be a pair (x,y) for learners that expect each example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
50 to represent such a pair. The second form is provided in case the example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
51 is high-dimensional and computations in the log-domain are numerically preferable.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
52 The first dimension of examples or of x and y is an index over a minibatch or a dataset.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
53
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
54 * p=learner.free_energy(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
55 for learners that can estimate a log unnormalized probability; the output has the same length as the input.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
56
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
57 * c=learner.costs(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
58 returns a matrix of costs (one row per example, i.e., again the output has the same length
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
59 as the input), the first column of which represents the cost whose expectation
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
60 we wish to minimize over new samples from the unknown underlying data distribution.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
61
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
62
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
63 Some learners may be able to handle x's and y's that contain missing values.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
64
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
65 * For convenience, some of these operations could be bundled, e.g.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
66
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
67 * [prediction,costs] = learner.predict_and_adapt((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
68
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
69 * Some learners could include in their internal state not only what they
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
70 have learned but some information about recently seen examples that conditions
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
71 the expected distribution of upcoming examples. In that case, they might
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
72 be used, e.g. in an online setting as follows:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
73 for (x,y) in data_stream:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
74 [prediction,costs]=learner.predict((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
75 accumulate_statistics(prediction,costs)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
76
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
77 * In some cases, each example is itself a (possibly variable-size) sequence
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
78 or other variable-size object (e.g. an image, or a video)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
79
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
80
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
81
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
82
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
83
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
84
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
85
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
86
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
87
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
88
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
89 James's idea for Learner Interface
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
90 ===================================
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
91
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
92 Theory:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
93 -------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
94
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
95 Think about the unfolding of a learning algorithm as exploring a path in a vast
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
96 directed graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
97
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
98 There are some source nodes, which are potential initial conditions for the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
99 learning algorithm.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
100
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
101 At any node, there are a number of outgoing labeled edges that represent
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
102 distinct directions of exploration: like "allocate a model with N hidden units",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
103 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
104 iterations" or "refresh the GPU dataset memory with the next batch of data".
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
105
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
106 Not all nodes have the same outgoing edge labels. The dataset, model, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
107 optimization algorithm implementations may each have their various
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
108 hyper-parameters with various restrictions on what values they can take, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
109 when they can be changed.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
110
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
111 Every move in this graph incurs some storage and computational expense, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
112 explores the graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
113
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
114 Learners typically engage in goal-directed exploration of this graph - for
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
115 example to find the node with the best validation-set performance given a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
116 certain computational budget. We might often be interested in the best node
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
117 found.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
118
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
119 The predict(), log_probability(), free_energy() etc correspond to costs that we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
120 can measure at any particular node (at some computational expense) to see how we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
121 are doing in our exploration.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
122
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
123 Many semantically distinct components come into the definition of this graph:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
124 the model (e.g. DAA) the dataset (e.g. an online one), the inference and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
125 learning strategy. I'm not sure what to call this graph than an 'experiment
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
126 graph'... so I'll go with that for now.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
127
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
128
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
129
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
130
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
131
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
132 Use Cases
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
133 ----------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
134
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
135 Early stopping
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
136 ~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
137
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
138 Early stopping can be implemented as a learner that progresses along a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
139 particular kind of edge (e.g. "train more") until a stopping criterion (in terms
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
140 of a cost computed from nodes along the path) is met.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
141
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
142
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
143 Grid Search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
144 ~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
145
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
146 Grid search is a learner policy that can be implemented in an experiment graph
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
147 where all paths have the form:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
148
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
149 ( "set param 0 to X",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
150 "set param 1 to Y",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
151 ... ,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
152 "set param N to Z",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
153 adapt,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
154 [early stop...],
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
155 test)
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
156
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
157 It would explore all paths of this form and then return the best node.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
158
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
159
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
160 Stagewise learning of DBNs combined with early stopping and grid search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
161 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
162
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
163 This would be a learner that is effective for experiment graphs that reflect the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
164 greedy-stagewise optimization of DBNs.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
165
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
166
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
167 Boosting
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
168 ~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
169
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
170 Given an ExperimentGraph that permits re-weighting of examples, it is
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
173 the picture and make a useful boosting implementation.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
174
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
175
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
176 Using External Hyper-Parameter Optimization Software
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
178 TODO: use-case - show how we could use the optimizer from
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
180
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
181
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
182 Implementation Details / API
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
183 ----------------------------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
184
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
185 Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
186 ~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
187 An object that allows us to explore the graph discussed above. Specifically, it represents
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
188 an explored node in that graph.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
189
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
190 def active_instructions()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
192 to handle.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
193 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
194
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
195 def copy(), deepcopy()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
196 """ Learners should be serializable """
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
197
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
198
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
199 To make the implementation easier, I found it was helpful to introduce a string-valued
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
200 `fsa_state` member attribute and associate methods to these states. That made it
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
201 syntactically easy to build relatively complex finite-state transition graphs to describe
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
202 which instructions were active at which times in the life-cycle of a learner.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
203
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
204
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
205 Instruction
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
206 ~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
207 An object that represents a potential edge in the graph discussed above. It is an
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
208 operation that a learner can perform.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
209
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
210 arg_types
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
211 """a list of Type object (see below) indicating what args are required by execute"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
212
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
213 def execute(learner, args, kwargs):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
214 """ Perform some operation on the learner (follow an edge in the graph discussed above)
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
216 the graph along an edge. To have the old learner as well, it must be copied prior to
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
217 calling execute().
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
218 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
219
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
220 def expense(learner, args, kwargs, resource_type='CPUtime'):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
221 """ Return an estimated cost of performing this instruction (calling execute), in time,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
222 space, number of computers, disk requierement, etc.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
223 """
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
224
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
225 Type
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
226 ~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
227 An object that describes a parameter domain for a call to Instruction.execute.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
229 `include` all legal arguments, and exclude as many illegal ones as possible.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
230
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
231 def includes(value):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
232 """return True if value is a legal argument"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
233
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
234
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
237 that automatic graph exploration algorithms can generate legal arguments with reasonable
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
238 efficiency.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
239
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
240
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
241
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
243 instances also introduce Proxy Instruction classes.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
244
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
248 more powerful instructions and hide simpler ones.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
249
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
250 It is less straightforward, but consistent with the design to implement a Learner that
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
251 encompasses job management. Such a learner would retain the semantics of the
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
254 etc.) Such a learner would replace synchronous instructions (return on completion) with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
255 asynchronous ones (return after scheduling) and the active instruction set would also change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
256 asynchronously, but neither of these things is inconsistent with the Learner API.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
257
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
258
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
259 TODO
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
260 ~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
261
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
262 I feel like something is missing from the API - and that is an interface to the graph structure
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
263 discussed above. The nodes in this graph are natural places to store meta-information for
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
265 itself. In other words, there is no API through which to attach information to nodes. It is
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
266 not good to say that the Learner instance *is* the node because (a) learner instances change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
269 over into other committees, so we should get their feedback about how to resolve it.
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
270
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
271 Comments
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
272 ~~~~~~~~
1055
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
273
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
274 YB asks: it seems to me that what we really need from "Type" is not just
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
275 testing that a value is legal, but more practically a function that specifies the
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
276 prior distribution for the hyper-parameter, i.e., how to sample from it,
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
277 and possibly some representation of it that could be used to infer
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
278 a posterior (such as an unnormalized log-density or log-probability).
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
279 Having the min and max and default limits us to the uniform distribution,
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
280 which may not always be appropriate. For example sometimes we'd like
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
281 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers).
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
282 For that reason, I think that "Type" is not a very good name.
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
283 How about "Prior" or "Density" or something like that?
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
284
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
285 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm
1045
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
286 interested to see how the learner interface is shaping up so I'll be keeping
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
287 an eye on this file)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
288 I'm wondering what's the benefit of such an API compared to simply defining a
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
289 new method for each instruction. It seems to me that typically, the 'execute'
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
290 method would end up being something like
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
291 if instruction == 'do_x':
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
292 self.do_x(..)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
293 elif instruction == 'do_y':
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
294 self.do_y(..)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
295 ...
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
296 so why not directly call do_x / do_y instead?
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
297
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
298
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
299 JB replies: I agree with you, and in the implementation of a Learner I suggest
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
300 using Python decorators to get the best of both worlds:
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
301
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
302 class NNet(Learner):
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
303
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
304 ...
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
305
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
306 @Instruction.new(arg_types=(Float(min=-8, max=-1, default=-4),))
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
307 def set_log_lr(self, log_lr):
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
308 self.lr.value = numpy.exp(log_lr)
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
309
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
310 ...
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
311
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
312 The Learner base class can implement a instruction_set() that walks through the
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
313 methods of 'self' and pick out the ones that have corresponding instructions.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
314 But anyone can call the method normally. The NNet class can also have methods
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
315 that are not instructions.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
316
1053
390166ace9e5 learner: Reply to James
Olivier Delalleau <delallea@iro>
parents: 1052
diff changeset
317 OD replies: Ok thanks. I'm still unsure what is the end goal, but I'll keep
390166ace9e5 learner: Reply to James
Olivier Delalleau <delallea@iro>
parents: 1052
diff changeset
318 watching while you guys work on it, and hopefully it'll become clearer for me ;)
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
319
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
320 RP asks: James correct me if I'm wrong, but I think each instruction has a execute
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
321 command. The job of the learner is to traverse the graph and for each edge
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
322 that it decides to cross to call the execute of that edge. Maybe James has
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
323 something else in mind, but this was my understanding.
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
324
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
325 JB replies: close, but let me make a bit of a clarification. The job of a
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
326 Learner is simply to implement the API of a Learner - to list what edges are
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
327 available and to be able to cross them if asked. The code *using* the Learner
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
328 (client) decides which edges to cross. The client may also be a Learner, but
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
329 maybe not.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
330
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
331
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
332
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
333 Just another view/spin on the same idea (Razvan)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
334 ================================================
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
335
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
336
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
337 My idea is probably just a spin off from what James wrote. It is an extension
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
338 of what I send on the mailing list some time ago.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
339
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
340 Big Picture
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
341 -----------
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
342
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
343 What do we care about ?
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
344 ~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
345
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
346 This is the list of the main points that I have in mind :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
347
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
348 * Re-usability
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
349 * Extensibility
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
350 * Simplicity or easily readable code ( connected to re-usability )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
351 * Modularity ( connected to extensibility )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
352 * Fast to write code ( - sort of comes out of simplicity)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
353 * Efficient code
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
354
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
355
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
356 Composition
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
357 ~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
358
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
359 To me this reads as code generated by composing pieces. Imagine this :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
360 you start of with something primitive that I will call a "variable", which
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
361 probably is a very unsuitable name. And then you compose those intial
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
362 "variables" or transform them through several "functions". Each such
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
363 "function" hides some logic, that you as the user don't care about.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
364 You can have low-level or micro "functions" and high-level or macro
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
365 "functions", where a high-level function is just a certain compositional
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
366 pattern of low-level "functions". There are several classes of "functions"
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
367 and "variables" that can be inter-changable. This is how modularity is
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
368 obtained, by chainging between functions from a certain class.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
369
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
370 Now when you want to research something, what you do is first select
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
371 the step you want to look into. If you are lucky you can re-write this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
372 step as certain decomposition of low-level transformations ( there can be
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
373 multiple such decompositions). If not you have to implement such a
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
374 decompositions acording to your needs. Pick the low-level transformations you want
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
375 to change and write new versions that implement your logic.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
376
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
377 I think the code will be easy to read, because it is just applying a fixed
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
378 set of transformations, one after the other. The one who writes the code can
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
379 decide how explicit he wants to write things by switching between high-level
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
380 and low-level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
381
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
382 I think the code this way is re-usable, because you can just take this chain
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
383 of transformation and replace the one you care about, without looking into
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
384 the rest.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
385
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
386 You get this fractal property of the code. Zooming in, you always get just
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
387 a set of functions applied to a set of variables. In the begining those might
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
388 not be there, and you would have to create new "low level" decompositions,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
389 maybe even new "variables" that get data between those decompositions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
390
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
391 The thing with variables here, is that I don't want this "functions" to have
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
392 a state. All the information is passed along through these variables. This
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
393 way understanding the graph is easy, debugging it is also easier ( then having
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
394 all these hidden states ..)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
395
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
396 Note that while doing so we might ( and I strongly think we should) create
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
397 a (symbolic) DAG of operations. ( this is where it becomes what James was saying).
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
398 In such a DAG the "variables" will the nodes and the functions will be edges.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
399 I think having a DAG is useful in many ways (all this are things that one
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
400 might think about implementing in a far future, I'm not proposing to implement
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
401 them unless we want to use them - like the reconstruction ):
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
402 * there exist the posibility of writing optimizations ( theano style )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
403 * there exist the posibility to add global view utility functions ( like
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
404 a reconstruction function for SdA - extremely low level here), or global
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
405 view diagnostic tools
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
406 * the posibility of creating a GUI ( where you just create the Graph by
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
407 picking transforms and variables from a list ) or working interactively
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
408 and then generating code that will reproduce the graph
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
409 * you can view the graph and different granularity levels to understand
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
410 things ( global diagnostics)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
411
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
412 We should have a taxonomy of possible classes of functions and possible
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
413 classes of variables, but those should not be exclusive. We can work at a high
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
414 level for now, and decompose those high level functions to lower level when
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
415 we need to. We can introduce new classes of functions or intermediate
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
416 variables between those low level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
417
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
418
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
419 Similarities with James' idea
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
420 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
421
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
422 As I said before, this is I think just another view on what James proposed.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
423 The learner in his case is the module that traverses the graph of this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
424 operations, which makes sense here as well.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
425
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
426 The 'execute' command in his api is just applying a function to some variables in
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
427 my case.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
428
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
429 The learner keeps track of the graph that is formed I think in both cases.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
430
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
431 His view is a bit more general. I see the graph as fully created by the user,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
432 and the learner just has to go from the start to the end. In his case the
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
433 traversal is conditioned on some policies. I think these ideas can be mixed /
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
434 united. What I would see in my case to have this functionality is something
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
435 similar to the lazy linker for Theano.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
436
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
437
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
438
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
439 JB asks: There is definitely a strong theme of graphs in both suggestions,
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
440 furthermore graphs that have heavy-duty nodes and light-weight edges. But I
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
441 don't necessarily think that we're proposing the same thing. One difference is
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
442 that the graph I talked about would be infinite in most cases of interest, so
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
443 it's not going to be representable by Theano's data structures (even with lazy
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
444 if). Another difference is that the graph I had in mind doesn't feel fractal -
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
445 it would be very common for a graph edge to be atomic. A proxy pattern, such as
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
446 in a hyper-learner would create a notion of being able to zoom in, but other
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
447 than that, i'm not sure what you mean.
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
448