annotate doc/v2_planning/learner.txt @ 1044:3b1fd599bafd

my first draft of my own views which are close to be just a reformulation of what James proposes
author Razvan Pascanu <r.pascanu@gmail.com>
date Wed, 08 Sep 2010 12:55:30 -0400
parents 3f528656855b
children d57bdd9a9980
rev   line source
1041
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
1
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
2 Comittee: AB, PL, GM, IG, RP, NB, PV
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
3 Leader: ?
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
4
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
5 Discussion of Function Specification for Learner Types
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
6 ======================================================
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
7
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
8 In its most abstract form, a learner is an object with the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
9 following semantics:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
10
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
11 * A learner has named hyper-parameters that control how it learns (these can be viewed
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
12 as options of the constructor, or might be set directly by a user)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
13
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
14 * A learner also has an internal state that depends on what it has learned.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
15
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
16 * A learner reads and produces data, so the definition of learner is
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
17 intimately linked to the definition of dataset (and task).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
18
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
19 * A learner has one or more 'train' or 'adapt' functions by which
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
20 it is given a sample of data (typically either the whole training set, or
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
21 a mini-batch, which contains as a special case a single 'example'). Learners
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
22 interface with datasets in order to obtain data. These functions cause the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
23 learner to change its internal state and take advantage to some extent
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
24 of the data provided. The 'train' function should take charge of
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
25 completely exploiting the dataset, as specified per the hyper-parameters,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
26 so that it would typically be called only once. An 'adapt' function
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
27 is meant for learners that can operate in an 'online' setting where
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
28 data continually arrive and the control loop (when to stop) is to
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
29 be managed outside of it. For most intents and purposes, the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
30 'train' function could also handle the 'online' case by providing
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
31 the controlled iterations over the dataset (which would then be
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
32 seen as a stream of examples).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
33 * learner.train(dataset)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
34 * learner.adapt(data)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
35
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
36 * Different types of learners can then exploit their internal state
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
37 in order to perform various computations after training is completed,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
38 or in the middle of training, e.g.,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
39
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
40 * y=learner.predict(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
41 for learners that see (x,y) pairs during training and predict y given x,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
42 or for learners that see only x's and learn a transformation of it (i.e. feature extraction).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
43 Here and below, x and y are tensor-like objects whose first index iterates
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
44 over particular examples in a batch or minibatch of examples.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
45
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
46 * p=learner.probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
47 p=learner.log_probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
48 for learners that can estimate probability density or probability functions,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
49 note that example could be a pair (x,y) for learners that expect each example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
50 to represent such a pair. The second form is provided in case the example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
51 is high-dimensional and computations in the log-domain are numerically preferable.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
52 The first dimension of examples or of x and y is an index over a minibatch or a dataset.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
53
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
54 * p=learner.free_energy(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
55 for learners that can estimate a log unnormalized probability; the output has the same length as the input.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
56
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
57 * c=learner.costs(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
58 returns a matrix of costs (one row per example, i.e., again the output has the same length
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
59 as the input), the first column of which represents the cost whose expectation
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
60 we wish to minimize over new samples from the unknown underlying data distribution.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
61
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
62
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
63 Some learners may be able to handle x's and y's that contain missing values.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
64
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
65 * For convenience, some of these operations could be bundled, e.g.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
66
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
67 * [prediction,costs] = learner.predict_and_adapt((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
68
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
69 * Some learners could include in their internal state not only what they
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
70 have learned but some information about recently seen examples that conditions
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
71 the expected distribution of upcoming examples. In that case, they might
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
72 be used, e.g. in an online setting as follows:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
73 for (x,y) in data_stream:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
74 [prediction,costs]=learner.predict((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
75 accumulate_statistics(prediction,costs)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
76
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
77 * In some cases, each example is itself a (possibly variable-size) sequence
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
78 or other variable-size object (e.g. an image, or a video)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
79
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
80
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
81
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
82
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
83
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
84
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
85
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
86
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
87
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
88
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
89 James's idea for Learner Interface
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
90 ===================================
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
91
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
92 Theory:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
93 -------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
94
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
95 Think about the unfolding of a learning algorithm as exploring a path in a vast
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
96 directed graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
97
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
98 There are some source nodes, which are potential initial conditions for the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
99 learning algorithm.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
100
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
101 At any node, there are a number of outgoing labeled edges that represent
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
102 distinct directions of exploration: like "allocate a model with N hidden units",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
103 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
104 iterations" or "refresh the GPU dataset memory with the next batch of data".
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
105
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
106 Not all nodes have the same outgoing edge labels. The dataset, model, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
107 optimization algorithm implementations may each have their various
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
108 hyper-parameters with various restrictions on what values they can take, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
109 when they can be changed.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
110
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
111 Every move in this graph incurs some storage and computational expense, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
112 explores the graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
113
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
114 Learners typically engage in goal-directed exploration of this graph - for
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
115 example to find the node with the best validation-set performance given a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
116 certain computational budget. We might often be interested in the best node
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
117 found.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
118
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
119 The predict(), log_probability(), free_energy() etc correspond to costs that we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
120 can measure at any particular node (at some computational expense) to see how we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
121 are doing in our exploration.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
122
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
123 Many semantically distinct components come into the definition of this graph:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
124 the model (e.g. DAA) the dataset (e.g. an online one), the inference and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
125 learning strategy. I'm not sure what to call this graph than an 'experiment
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
126 graph'... so I'll go with that for now.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
127
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
128
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
129
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
130
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
131
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
132 Use Cases
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
133 ----------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
134
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
135 Early stopping
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
136 ~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
137
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
138 Early stopping can be implemented as a learner that progresses along a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
139 particular kind of edge (e.g. "train more") until a stopping criterion (in terms
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
140 of a cost computed from nodes along the path) is met.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
141
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
142
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
143 Grid Search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
144 ~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
145
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
146 Grid search is a learner policy that can be implemented in an experiment graph
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
147 where all paths have the form:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
148
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
149 ( "set param 0 to X",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
150 "set param 1 to Y",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
151 ... ,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
152 "set param N to Z",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
153 adapt,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
154 [early stop...],
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
155 test)
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
156
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
157 It would explore all paths of this form and then return the best node.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
158
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
159
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
160 Stagewise learning of DBNs combined with early stopping and grid search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
161 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
162
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
163 This would be a learner that is effective for experiment graphs that reflect the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
164 greedy-stagewise optimization of DBNs.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
165
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
166
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
167 Boosting
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
168 ~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
169
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
170 Given an ExperimentGraph that permits re-weighting of examples, it is
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
173 the picture and make a useful boosting implementation.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
174
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
175
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
176 Using External Hyper-Parameter Optimization Software
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
178 TODO: use-case - show how we could use the optimizer from
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
180
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
181
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
182 Implementation Details / API
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
183 ----------------------------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
184
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
185 Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
186 ~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
187 An object that allows us to explore the graph discussed above. Specifically, it represents
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
188 an explored node in that graph.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
189
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
190 def active_instructions()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
192 to handle.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
193 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
194
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
195 def copy(), deepcopy()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
196 """ Learners should be serializable """
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
197
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
198
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
199 To make the implementation easier, I found it was helpful to introduce a string-valued
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
200 `fsa_state` member attribute and associate methods to these states. That made it
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
201 syntactically easy to build relatively complex finite-state transition graphs to describe
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
202 which instructions were active at which times in the life-cycle of a learner.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
203
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
204
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
205 Instruction
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
206 ~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
207 An object that represents a potential edge in the graph discussed above. It is an
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
208 operation that a learner can perform.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
209
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
210 arg_types
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
211 """a list of Type object (see below) indicating what args are required by execute"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
212
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
213 def execute(learner, args, kwargs):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
214 """ Perform some operation on the learner (follow an edge in the graph discussed above)
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
216 the graph along an edge. To have the old learner as well, it must be copied prior to
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
217 calling execute().
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
218 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
219
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
220 def expense(learner, args, kwargs, resource_type='CPUtime'):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
221 """ Return an estimated cost of performing this instruction (calling execute), in time,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
222 space, number of computers, disk requierement, etc.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
223 """
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
224
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
225 Type
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
226 ~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
227 An object that describes a parameter domain for a call to Instruction.execute.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
229 `include` all legal arguments, and exclude as many illegal ones as possible.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
230
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
231 def includes(value):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
232 """return True if value is a legal argument"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
233
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
234
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
237 that automatic graph exploration algorithms can generate legal arguments with reasonable
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
238 efficiency.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
239
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
240
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
241
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
243 instances also introduce Proxy Instruction classes.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
244
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
248 more powerful instructions and hide simpler ones.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
249
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
250 It is less straightforward, but consistent with the design to implement a Learner that
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
251 encompasses job management. Such a learner would retain the semantics of the
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
254 etc.) Such a learner would replace synchronous instructions (return on completion) with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
255 asynchronous ones (return after scheduling) and the active instruction set would also change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
256 asynchronously, but neither of these things is inconsistent with the Learner API.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
257
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
258
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
259 TODO
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
260 ~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
261
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
262 I feel like something is missing from the API - and that is an interface to the graph structure
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
263 discussed above. The nodes in this graph are natural places to store meta-information for
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
265 itself. In other words, there is no API through which to attach information to nodes. It is
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
266 not good to say that the Learner instance *is* the node because (a) learner instances change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
269 over into other committees, so we should get their feedback about how to resolve it.
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
270
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
271 Just another view/spin on the same idea (Razvan)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
272 ================================================
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
273
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
274
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
275 My idea is probably just a spin off from what James wrote. It is an extension
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
276 of what I send on the mailing list some time ago.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
277
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
278 Big Picture
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
279 -----------
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
280
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
281 What do we care about ?
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
282 ~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
283
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
284 This is the list of the main points that I have in mind :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
285
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
286 * Re-usability
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
287 * Extensibility
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
288 * Simplicity or easily readable code ( connected to re-usability )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
289 * Modularity ( connected to extensibility )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
290 * Fast to write code ( - sort of comes out of simplicity)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
291 * Efficient code
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
292
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
293
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
294 Composition
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
295 ~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
296
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
297 To me this reads as code generated by composing pieces. Imagine this :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
298 you start of with something primitive that I will call a "variable", which
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
299 probably is a very unsuitable name. And then you compose those intial
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
300 "variables" or transform them through several "functions". Each such
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
301 "function" hides some logic, that you as the user don't care about.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
302 You can have low-level or micro "functions" and high-level or macro
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
303 "functions", where a high-level function is just a certain compositional
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
304 pattern of low-level "functions". There are several classes of "functions"
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
305 and "variables" that can be inter-changable. This is how modularity is
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
306 obtained, by chainging between functions from a certain class.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
307
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
308 Now when you want to research something, what you do is first select
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
309 the step you want to look into. If you are lucky you can re-write this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
310 step as certain decomposition of low-level transformations ( there can be
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
311 multiple such decompositions). If not you have to implement such a
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
312 decompositions acording to your needs. Pick the low-level transformations you want
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
313 to change and write new versions that implement your logic.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
314
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
315 I think the code will be easy to read, because it is just applying a fixed
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
316 set of transformations, one after the other. The one who writes the code can
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
317 decide how explicit he wants to write things by switching between high-level
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
318 and low-level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
319
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
320 I think the code this way is re-usable, because you can just take this chain
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
321 of transformation and replace the one you care about, without looking into
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
322 the rest.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
323
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
324 You get this fractal property of the code. Zooming in, you always get just
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
325 a set of functions applied to a set of variables. In the begining those might
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
326 not be there, and you would have to create new "low level" decompositions,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
327 maybe even new "variables" that get data between those decompositions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
328
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
329 The thing with variables here, is that I don't want this "functions" to have
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
330 a state. All the information is passed along through these variables. This
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
331 way understanding the graph is easy, debugging it is also easier ( then having
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
332 all these hidden states ..)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
333
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
334 Note that while doing so we might ( and I strongly think we should) create
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
335 a (symbolic) DAG of operations. ( this is where it becomes what James was saying).
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
336 In such a DAG the "variables" will the nodes and the functions will be edges.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
337 I think having a DAG is useful in many ways (all this are things that one
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
338 might think about implementing in a far future, I'm not proposing to implement
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
339 them unless we want to use them - like the reconstruction ):
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
340 * there exist the posibility of writing optimizations ( theano style )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
341 * there exist the posibility to add global view utility functions ( like
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
342 a reconstruction function for SdA - extremely low level here), or global
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
343 view diagnostic tools
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
344 * the posibility of creating a GUI ( where you just create the Graph by
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
345 picking transforms and variables from a list ) or working interactively
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
346 and then generating code that will reproduce the graph
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
347 * you can view the graph and different granularity levels to understand
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
348 things ( global diagnostics)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
349
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
350 We should have a taxonomy of possible classes of functions and possible
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
351 classes of variables, but those should not be exclusive. We can work at a high
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
352 level for now, and decompose those high level functions to lower level when
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
353 we need to. We can introduce new classes of functions or intermediate
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
354 variables between those low level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
355
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
356
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
357 Similarities with James' idea
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
358 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
359
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
360 As I said before, this is I think just another view on what James proposed.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
361 The learner in his case is the module that traverses the graph of this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
362 operations, which makes sense here as well.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
363
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
364 The 'execute' command in his api is just applying a function to some variables in
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
365 my case.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
366
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
367 The learner keeps track of the graph that is formed I think in both cases.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
368
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
369 His view is a bit more general. I see the graph as fully created by the user,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
370 and the learner just has to go from the start to the end. In his case the
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
371 traversal is conditioned on some policies. I think these ideas can be mixed /
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
372 united. What I would see in my case to have this functionality is something
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
373 similar to the lazy linker for Theano.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
374
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
375
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
376
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
377