annotate doc/v2_planning/learner.txt @ 1082:f9f72ae84313

dataset: Added a couple points we did not have time to discuss during meeting
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 15:36:23 -0400
parents f082a6c0b008
children 7a8dcf87d780
rev   line source
1041
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
1
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
2 Comittee: AB, PL, GM, IG, RP, NB, PV
38cc6e075d9b PV added to learner committee
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1038
diff changeset
3 Leader: ?
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
4
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
5 Discussion of Function Specification for Learner Types
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
6 ======================================================
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
7
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
8 In its most abstract form, a learner is an object with the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
9 following semantics:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
10
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
11 * A learner has named hyper-parameters that control how it learns (these can be viewed
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
12 as options of the constructor, or might be set directly by a user)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
13
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
14 * A learner also has an internal state that depends on what it has learned.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
15
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
16 * A learner reads and produces data, so the definition of learner is
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
17 intimately linked to the definition of dataset (and task).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
18
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
19 * A learner has one or more 'train' or 'adapt' functions by which
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
20 it is given a sample of data (typically either the whole training set, or
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
21 a mini-batch, which contains as a special case a single 'example'). Learners
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
22 interface with datasets in order to obtain data. These functions cause the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
23 learner to change its internal state and take advantage to some extent
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
24 of the data provided. The 'train' function should take charge of
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
25 completely exploiting the dataset, as specified per the hyper-parameters,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
26 so that it would typically be called only once. An 'adapt' function
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
27 is meant for learners that can operate in an 'online' setting where
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
28 data continually arrive and the control loop (when to stop) is to
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
29 be managed outside of it. For most intents and purposes, the
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
30 'train' function could also handle the 'online' case by providing
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
31 the controlled iterations over the dataset (which would then be
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
32 seen as a stream of examples).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
33 * learner.train(dataset)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
34 * learner.adapt(data)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
35
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
36 * Different types of learners can then exploit their internal state
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
37 in order to perform various computations after training is completed,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
38 or in the middle of training, e.g.,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
39
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
40 * y=learner.predict(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
41 for learners that see (x,y) pairs during training and predict y given x,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
42 or for learners that see only x's and learn a transformation of it (i.e. feature extraction).
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
43 Here and below, x and y are tensor-like objects whose first index iterates
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
44 over particular examples in a batch or minibatch of examples.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
45
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
46 * p=learner.probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
47 p=learner.log_probability(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
48 for learners that can estimate probability density or probability functions,
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
49 note that example could be a pair (x,y) for learners that expect each example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
50 to represent such a pair. The second form is provided in case the example
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
51 is high-dimensional and computations in the log-domain are numerically preferable.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
52 The first dimension of examples or of x and y is an index over a minibatch or a dataset.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
53
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
54 * p=learner.free_energy(x)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
55 for learners that can estimate a log unnormalized probability; the output has the same length as the input.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
56
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
57 * c=learner.costs(examples)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
58 returns a matrix of costs (one row per example, i.e., again the output has the same length
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
59 as the input), the first column of which represents the cost whose expectation
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
60 we wish to minimize over new samples from the unknown underlying data distribution.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
61
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
62
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
63 Some learners may be able to handle x's and y's that contain missing values.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
64
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
65 * For convenience, some of these operations could be bundled, e.g.
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
66
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
67 * [prediction,costs] = learner.predict_and_adapt((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
68
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
69 * Some learners could include in their internal state not only what they
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
70 have learned but some information about recently seen examples that conditions
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
71 the expected distribution of upcoming examples. In that case, they might
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
72 be used, e.g. in an online setting as follows:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
73 for (x,y) in data_stream:
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
74 [prediction,costs]=learner.predict((x,y))
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
75 accumulate_statistics(prediction,costs)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
76
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
77 * In some cases, each example is itself a (possibly variable-size) sequence
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
78 or other variable-size object (e.g. an image, or a video)
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
79
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
80
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
81
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
82
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
83
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
84
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
85
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
86
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
87
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
88
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
89 James's idea for Learner Interface
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
90 ===================================
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
91
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
92 Theory:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
93 -------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
94
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
95 Think about the unfolding of a learning algorithm as exploring a path in a vast
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
96 directed graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
97
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
98 There are some source nodes, which are potential initial conditions for the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
99 learning algorithm.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
100
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
101 At any node, there are a number of outgoing labeled edges that represent
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
102 distinct directions of exploration: like "allocate a model with N hidden units",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
103 or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
104 iterations" or "refresh the GPU dataset memory with the next batch of data".
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
105
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
106 Not all nodes have the same outgoing edge labels. The dataset, model, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
107 optimization algorithm implementations may each have their various
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
108 hyper-parameters with various restrictions on what values they can take, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
109 when they can be changed.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
110
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
111 Every move in this graph incurs some storage and computational expense, and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
112 explores the graph.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
113
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
114 Learners typically engage in goal-directed exploration of this graph - for
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
115 example to find the node with the best validation-set performance given a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
116 certain computational budget. We might often be interested in the best node
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
117 found.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
118
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
119 The predict(), log_probability(), free_energy() etc correspond to costs that we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
120 can measure at any particular node (at some computational expense) to see how we
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
121 are doing in our exploration.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
122
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
123 Many semantically distinct components come into the definition of this graph:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
124 the model (e.g. DAA) the dataset (e.g. an online one), the inference and
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
125 learning strategy. I'm not sure what to call this graph than an 'experiment
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
126 graph'... so I'll go with that for now.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
127
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
128
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
129
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
130
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
131
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
132 Use Cases
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
133 ----------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
134
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
135 Early stopping
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
136 ~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
137
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
138 Early stopping can be implemented as a learner that progresses along a
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
139 particular kind of edge (e.g. "train more") until a stopping criterion (in terms
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
140 of a cost computed from nodes along the path) is met.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
141
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
142
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
143 Grid Search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
144 ~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
145
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
146 Grid search is a learner policy that can be implemented in an experiment graph
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
147 where all paths have the form:
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
148
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
149 ( "set param 0 to X",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
150 "set param 1 to Y",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
151 ... ,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
152 "set param N to Z",
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
153 adapt,
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
154 [early stop...],
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
155 test)
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
156
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
157 It would explore all paths of this form and then return the best node.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
158
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
159
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
160 Stagewise learning of DBNs combined with early stopping and grid search
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
161 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
162
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
163 This would be a learner that is effective for experiment graphs that reflect the
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
164 greedy-stagewise optimization of DBNs.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
165
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
166
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
167 Boosting
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
168 ~~~~~~~~
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
169
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
170 Given an ExperimentGraph that permits re-weighting of examples, it is
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
173 the picture and make a useful boosting implementation.
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
174
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
175
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
176 Using External Hyper-Parameter Optimization Software
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
178 TODO: use-case - show how we could use the optimizer from
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
180
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
181
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
182 Implementation Details / API
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
183 ----------------------------
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
184
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
185 Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
186 ~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
187 An object that allows us to explore the graph discussed above. Specifically, it represents
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
188 an explored node in that graph.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
189
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
190 def active_instructions()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
192 to handle.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
193 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
194
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
195 def copy(), deepcopy()
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
196 """ Learners should be serializable """
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
197
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
198
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
199 To make the implementation easier, I found it was helpful to introduce a string-valued
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
200 `fsa_state` member attribute and associate methods to these states. That made it
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
201 syntactically easy to build relatively complex finite-state transition graphs to describe
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
202 which instructions were active at which times in the life-cycle of a learner.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
203
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
204
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
205 Instruction
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
206 ~~~~~~~~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
207 An object that represents a potential edge in the graph discussed above. It is an
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
208 operation that a learner can perform.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
209
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
210 arg_types
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
211 """a list of Type object (see below) indicating what args are required by execute"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
212
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
213 def execute(learner, args, kwargs):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
214 """ Perform some operation on the learner (follow an edge in the graph discussed above)
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
216 the graph along an edge. To have the old learner as well, it must be copied prior to
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
217 calling execute().
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
218 """
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
219
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
220 def expense(learner, args, kwargs, resource_type='CPUtime'):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
221 """ Return an estimated cost of performing this instruction (calling execute), in time,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
222 space, number of computers, disk requierement, etc.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
223 """
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
224
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
225 Type
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
226 ~~~~
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
227 An object that describes a parameter domain for a call to Instruction.execute.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
229 `include` all legal arguments, and exclude as many illegal ones as possible.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
230
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
231 def includes(value):
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
232 """return True if value is a legal argument"""
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
233
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
234
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
237 that automatic graph exploration algorithms can generate legal arguments with reasonable
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
238 efficiency.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
239
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
240
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
241
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
243 instances also introduce Proxy Instruction classes.
1026
38f799f8b6cd v2_planning - thoughts on learner
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1002
diff changeset
244
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
248 more powerful instructions and hide simpler ones.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
249
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
250 It is less straightforward, but consistent with the design to implement a Learner that
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
251 encompasses job management. Such a learner would retain the semantics of the
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools,
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
254 etc.) Such a learner would replace synchronous instructions (return on completion) with
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
255 asynchronous ones (return after scheduling) and the active instruction set would also change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
256 asynchronously, but neither of these things is inconsistent with the Learner API.
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
257
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
258
1058
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
259 TODO - Experiment API?
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
260 ~~~~~~~~~~~~~~~~~~~~~~
1043
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
261
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
262 I feel like something is missing from the API - and that is an interface to the graph structure
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
263 discussed above. The nodes in this graph are natural places to store meta-information for
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
265 itself. In other words, there is no API through which to attach information to nodes. It is
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
266 not good to say that the Learner instance *is* the node because (a) learner instances change
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a
3f528656855b v2planning learner.txt - updated API recommendation
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1041
diff changeset
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills
1058
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
269 over into other committees, so we should get their feedback about how to resolve
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
270 it. Maybe we need an 'Experiment' API to stand for this graph?
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
271
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
272
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
273 TODO: Validation & Monitoring Costs
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
274 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
275
1058
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
276 Even if we do have the Experiment API as a structure to hang validation and
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
277 monitoring results, what should be the mechanism for extracting those results.
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
278 The Learner API is not right because extracting a monitoring cost doesn't change
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
279 the model, doesn't change the legal instructions/edges etc. Maybe we should use
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
280 a similar mechanism to Instruction, called something like Measurement? Any node
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
281 / learner can report the list of instructions (for moving) and the list of
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
282 measurements (and the cost of computing them too)
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
283
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
284
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
285 TODO - Parameter Distributions
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
286 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1055
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
287
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
288 YB asks: it seems to me that what we really need from "Type" is not just
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
289 testing that a value is legal, but more practically a function that specifies the
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
290 prior distribution for the hyper-parameter, i.e., how to sample from it,
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
291 and possibly some representation of it that could be used to infer
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
292 a posterior (such as an unnormalized log-density or log-probability).
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
293 Having the min and max and default limits us to the uniform distribution,
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
294 which may not always be appropriate. For example sometimes we'd like
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
295 Gaussian (-infty to infty) or Exponential (0 to infty) or Poisson (non-negative integers).
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
296 For that reason, I think that "Type" is not a very good name.
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
297 How about "Prior" or "Density" or something like that?
bc3f7834db83 added a comment/question about Type
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1053
diff changeset
298
1058
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
299 JB replies: I agree that being able to choose (and update) distributions over
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
300 these values is important. I don't think the Type structure is the right place
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
301 to handle it though. The challenge is to allow those distributions to change
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
302 for a variety of reasons - e.g. the sampling distribution on the capacity
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
303 variables is affected by the size of the dataset, it is also affected by
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
304 previous experience in general as well as experiments on that particular
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
305 dataset. I'm not sure that the 'Type' structure is right to deal with this.
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
306 Also, even with a strategy for handling these distributions, I believe a simple
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
307 mechanism for rejecting insane values might be useful.
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
308
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
309 So how should we handle it? Hmmm...
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
310
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
311
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
312 Comments
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
313 ~~~~~~~~
e342de3ae485 v2planning learner - added comments and TODO points
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1055
diff changeset
314
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
315 OD asks: (I hope it's ok to leave comments even though I'm not in committee... I'm
1045
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
316 interested to see how the learner interface is shaping up so I'll be keeping
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
317 an eye on this file)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
318 I'm wondering what's the benefit of such an API compared to simply defining a
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
319 new method for each instruction. It seems to me that typically, the 'execute'
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
320 method would end up being something like
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
321 if instruction == 'do_x':
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
322 self.do_x(..)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
323 elif instruction == 'do_y':
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
324 self.do_y(..)
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
325 ...
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
326 so why not directly call do_x / do_y instead?
d57bdd9a9980 learner: Left a comment about James' design
Olivier Delalleau <delallea@iro>
parents: 1044
diff changeset
327
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
328
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
329 JB replies: I agree with you, and in the implementation of a Learner I suggest
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
330 using Python decorators to get the best of both worlds:
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
331
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
332 class NNet(Learner):
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
333
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
334 ...
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
335
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
336 @Instruction.new(arg_types=(Float(min=-8, max=-1, default=-4),))
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
337 def set_log_lr(self, log_lr):
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
338 self.lr.value = numpy.exp(log_lr)
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
339
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
340 ...
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
341
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
342 The Learner base class can implement a instruction_set() that walks through the
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
343 methods of 'self' and pick out the ones that have corresponding instructions.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
344 But anyone can call the method normally. The NNet class can also have methods
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
345 that are not instructions.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
346
1053
390166ace9e5 learner: Reply to James
Olivier Delalleau <delallea@iro>
parents: 1052
diff changeset
347 OD replies: Ok thanks. I'm still unsure what is the end goal, but I'll keep
390166ace9e5 learner: Reply to James
Olivier Delalleau <delallea@iro>
parents: 1052
diff changeset
348 watching while you guys work on it, and hopefully it'll become clearer for me ;)
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
349
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
350 RP asks: James correct me if I'm wrong, but I think each instruction has a execute
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
351 command. The job of the learner is to traverse the graph and for each edge
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
352 that it decides to cross to call the execute of that edge. Maybe James has
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
353 something else in mind, but this was my understanding.
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
354
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
355 JB replies: close, but let me make a bit of a clarification. The job of a
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
356 Learner is simply to implement the API of a Learner - to list what edges are
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
357 available and to be able to cross them if asked. The code *using* the Learner
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
358 (client) decides which edges to cross. The client may also be a Learner, but
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
359 maybe not.
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
360
1046
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
361
f1732269bce8 comment on Olivier's comment
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1045
diff changeset
362
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
363 Just another view/spin on the same idea (Razvan)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
364 ================================================
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
365
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
366
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
367 My idea is probably just a spin off from what James wrote. It is an extension
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
368 of what I send on the mailing list some time ago.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
369
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
370 Big Picture
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
371 -----------
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
372
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
373 What do we care about ?
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
374 ~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
375
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
376 This is the list of the main points that I have in mind :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
377
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
378 * Re-usability
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
379 * Extensibility
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
380 * Simplicity or easily readable code ( connected to re-usability )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
381 * Modularity ( connected to extensibility )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
382 * Fast to write code ( - sort of comes out of simplicity)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
383 * Efficient code
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
384
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
385
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
386 Composition
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
387 ~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
388
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
389 To me this reads as code generated by composing pieces. Imagine this :
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
390 you start of with something primitive that I will call a "variable", which
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
391 probably is a very unsuitable name. And then you compose those intial
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
392 "variables" or transform them through several "functions". Each such
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
393 "function" hides some logic, that you as the user don't care about.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
394 You can have low-level or micro "functions" and high-level or macro
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
395 "functions", where a high-level function is just a certain compositional
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
396 pattern of low-level "functions". There are several classes of "functions"
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
397 and "variables" that can be inter-changable. This is how modularity is
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
398 obtained, by chainging between functions from a certain class.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
399
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
400 Now when you want to research something, what you do is first select
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
401 the step you want to look into. If you are lucky you can re-write this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
402 step as certain decomposition of low-level transformations ( there can be
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
403 multiple such decompositions). If not you have to implement such a
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
404 decompositions acording to your needs. Pick the low-level transformations you want
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
405 to change and write new versions that implement your logic.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
406
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
407 I think the code will be easy to read, because it is just applying a fixed
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
408 set of transformations, one after the other. The one who writes the code can
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
409 decide how explicit he wants to write things by switching between high-level
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
410 and low-level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
411
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
412 I think the code this way is re-usable, because you can just take this chain
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
413 of transformation and replace the one you care about, without looking into
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
414 the rest.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
415
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
416 You get this fractal property of the code. Zooming in, you always get just
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
417 a set of functions applied to a set of variables. In the begining those might
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
418 not be there, and you would have to create new "low level" decompositions,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
419 maybe even new "variables" that get data between those decompositions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
420
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
421 The thing with variables here, is that I don't want this "functions" to have
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
422 a state. All the information is passed along through these variables. This
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
423 way understanding the graph is easy, debugging it is also easier ( then having
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
424 all these hidden states ..)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
425
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
426 Note that while doing so we might ( and I strongly think we should) create
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
427 a (symbolic) DAG of operations. ( this is where it becomes what James was saying).
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
428 In such a DAG the "variables" will the nodes and the functions will be edges.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
429 I think having a DAG is useful in many ways (all this are things that one
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
430 might think about implementing in a far future, I'm not proposing to implement
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
431 them unless we want to use them - like the reconstruction ):
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
432 * there exist the posibility of writing optimizations ( theano style )
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
433 * there exist the posibility to add global view utility functions ( like
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
434 a reconstruction function for SdA - extremely low level here), or global
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
435 view diagnostic tools
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
436 * the posibility of creating a GUI ( where you just create the Graph by
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
437 picking transforms and variables from a list ) or working interactively
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
438 and then generating code that will reproduce the graph
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
439 * you can view the graph and different granularity levels to understand
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
440 things ( global diagnostics)
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
441
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
442 We should have a taxonomy of possible classes of functions and possible
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
443 classes of variables, but those should not be exclusive. We can work at a high
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
444 level for now, and decompose those high level functions to lower level when
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
445 we need to. We can introduce new classes of functions or intermediate
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
446 variables between those low level functions.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
447
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
448
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
449 Similarities with James' idea
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
450 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
451
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
452 As I said before, this is I think just another view on what James proposed.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
453 The learner in his case is the module that traverses the graph of this
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
454 operations, which makes sense here as well.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
455
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
456 The 'execute' command in his api is just applying a function to some variables in
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
457 my case.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
458
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
459 The learner keeps track of the graph that is formed I think in both cases.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
460
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
461 His view is a bit more general. I see the graph as fully created by the user,
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
462 and the learner just has to go from the start to the end. In his case the
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
463 traversal is conditioned on some policies. I think these ideas can be mixed /
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
464 united. What I would see in my case to have this functionality is something
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
465 similar to the lazy linker for Theano.
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
466
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
467
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
468
1052
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
469 JB asks: There is definitely a strong theme of graphs in both suggestions,
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
470 furthermore graphs that have heavy-duty nodes and light-weight edges. But I
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
471 don't necessarily think that we're proposing the same thing. One difference is
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
472 that the graph I talked about would be infinite in most cases of interest, so
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
473 it's not going to be representable by Theano's data structures (even with lazy
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
474 if). Another difference is that the graph I had in mind doesn't feel fractal -
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
475 it would be very common for a graph edge to be atomic. A proxy pattern, such as
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
476 in a hyper-learner would create a notion of being able to zoom in, but other
84f62533e7a8 v2planning learner - reply to comments
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1046
diff changeset
477 than that, i'm not sure what you mean.
1044
3b1fd599bafd my first draft of my own views which are close to be just a reformulation of what James proposes
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1043
diff changeset
478
1056
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
479 RP replies: I've been thinking about my idea a bit and yes, it might be
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
480 quite different from what James has in mind, though there are plently of common
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
481 elements. I might have exagerated a bit with the zooming in, so in some cases
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
482 you will end up with atomic edges, though my hope is that is not most of the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
483 edges.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
484
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
485 I think I should go into mode details when answering this question because
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
486 I feel I have not explained things sufficiently clear. Note, in many places
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
487 I replaced the word "function" by "transform".
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
488
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
489 Think of the learner as an object that traverses a DAG of steps created by the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
490 user. On this DAG the learner can potentially do a lot of cool stuff, but we
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
491 won't care about that for now. The DAG can be infinite in principle, and what
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
492 the learner does is just to go on the path described by the user ( and here
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
493 described is not through heuristics like in James case, but by giving the list
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
494 of edges it needs to follow). A potential cool thing the learner can do is to
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
495 regard the path given by the user as a suggestion ( or some form of heuristic)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
496 and try to improve it. This would be much closer to what James has in mind,
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
497 and I definetely think is a cool way to go about it.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
498
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
499 Now this path in the graph is given by the user by composing subgraphs or
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
500 adding nodes to the graph. Or (expressing this in a more simple way) by applying
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
501 functions to variables. Any such function will introduce an edge ( or a subgraph) that
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
502 will connect the vertices corresponding to the input variables to the vertices
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
503 corresponding to the output variables. The variables store the state of the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
504 learner. These functions are state-less, I think if you would give them states
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
505 you will make this approach really ugly (I might be wrong).
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
506 The variables would contain informations required by the function, like
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
507 number of layers, on how many cores to run, cluster configurations, and so on.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
508
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
509 Now about the zooming part, that James asked. I might have exagerated a bit,
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
510 is not that you can zoom in on any part infinitely. You will end up with
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
511 things that are atomic. The idea is that any such "transformation" or edge
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
512 has the potential to be split up in several "transformations". This offers
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
513 (in my view) a way of solving the time constraints of our project. We can
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
514 start by difining a coarse division in segments. For now we can have
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
515 a structure transform that makes a list of parameters into a deep
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
516 network of some type, then a learner transform that adds SGD + pre-training
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
517 on top of network, and then early stopper on top of that, and then a
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
518 run_on_cluster on that.We would probably want something more finely grained
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
519 even from the start .. this is just to prove my point. When any of us
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
520 starts experimenting with a certain sub-step of this process ( like the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
521 structure) we will split that transform into several ( like ones that create
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
522 a layer and so on) that make sense for that case, and then start working on
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
523 the low level transform that we cares ( like the layer) introducing new
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
524 versions of it. I think we can not find a universal split that will cover
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
525 all of our cases, so I think we should allow different such splits. The one
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
526 who researches should look at what low-level transforms are available and use
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
527 those if they make sense, if not he would have to create a different split.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
528 Creating a different split might involve a lot of work and taking care of
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
529 several issues so it should be done with care.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
530
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
531 I'll give an example from where I started thinking this way. Let say we want
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
532 to do the SdA with auxiliary inputs that encourages separation of the features
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
533 in the hidden layer that Yoshua was saying ( I had an attempt
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
534 at it some time ago for speech but I never eneded up finishing that project).
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
535
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
536 You start up with something like :
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
537
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
538 learner = Learner()
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
539 # This will create the learner that will traverse our graph. We might
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
540 # want it to be a function ``execute``, I just randomly picked this option.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
541 #I have no preference of this detail for now .. this is mostly work in progress
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
542
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
543 data = someSpeechData(path = 'some path')
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
544 # This is such a transform that will generate from the string representing the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
545 # path a dataset variable ( that will contain all informations you need to
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
546 # access data). This will probably be the object the datasets comittee will
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
547 # provide. Note, you might need to provide more information then the path, but
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
548 # you can easily see how to do that. All these stuff start from simple
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
549 # variables like path, batch size and so on and return a complex heavy duty
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
550 # variable (node).
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
551
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
552
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
553 model = earlyStopping(pretrain(SdA(layers = [524, 500, 500,27], noise = [0.1,0.1]),data, epochs = 10), data)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
554 # This is a composition of two transforms. The SdA transform starts from the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
555 # info about layers and corruption /noise for each layer and construct a SdA.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
556 # This is a high level transform, so it will take care of defining all
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
557 # details, like pre-training, defining the cost and so on. Note that maybe it will
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
558 # require some more parameters .. you can assume that for anything else there
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
559 # is a default value that the SdA will use. earlyStopping is yet another
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
560 # transform that takes a model ( that we know how to train ) and some data,
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
561 # and does early stoppign on it. For bravity I did not provide all the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
562 # information required like patience and so on. The SdA only knows how to do a
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
563 # step of training. Same holds for pretrain. It will loop over the layers of
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
564 # SdA and will train each one.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
565
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
566 steps = cluster(model, getPropertiesAndRanges(model), n_jobs = 20, cluster_info = getClusterInfo())
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
567 # This will lunch the wanted jobs. getPropertiesAndRanges will get from a
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
568 # model all knobs that need to be turn, and their ranges and will uniformly
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
569 # sample from them in each jobs. getCluterInfo will return a variable
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
570 # containing informations about the cluster ( I added this for simplicity, it
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
571 # should probably be replaced with something like username, password,
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
572 # clusterpath or whatever).
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
573
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
574 learner.execute(steps)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
575 # As an option, each of this output variables could contain the entire graph
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
576 # until that point. We could also have this in a different way .. this is
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
577 # adhoc at the moment
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
578
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
579
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
580 Now this is a coarse vanila SdA which is not what we wanted. We do not have a
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
581 way of incorporating our auxiliary information in this. So what we have to do
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
582 is split/change the SdA transform. We would re-write it as :
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
583
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
584
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
585 arch = SdA(layers = [524, 500, 500, 27], noise = [0.1,0.1])
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
586 model = earlyStopping(pretrain(arch,data,epochs = 10)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
587 ...
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
588
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
589 And then re-write things like :
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
590
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
591 arch = SGD( cross_entropy( logreg( DAAlayer( [DAAlayer([524,500],0.1),500],0.1))))
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
592
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
593
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
594 We would re-write the DAAlayer as :
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
595
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
596 layer0 = DAAlayer([524,500],0.1)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
597 layer1 = cross_entropy(reconstruct( tanh(dotW_b( layer0,500)),noise = 0.1))
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
598
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
599 At this point of detail, we can start inserting our new stuff in as follows :
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
600
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
601
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
602 input = empty_layer(600)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
603 # empty layer is a wrapper ; if I would to write dotW_b(200,500) which means
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
604 # go from a layer of 200 units to a one of 500 by multiplying with a matrix
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
605 # and adding a bias, what I would mean is dotW_b( empty_layer(200), 500).
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
606 # an implementation of empty_layer could be just theano.tensor.vector()
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
607 # where we add the size tag ( we will need it later)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
608
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
609
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
610 hidden0_mfcc = dotW_b(input[0:524],100)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
611 hidden0_noise = dotW_b(input[0:560],50)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
612 hidden0_speakerID = dotW_b(join(input[0:524], input[560:600]),50)
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
613 hidden0 = tanh(join( layer0_mfcc, layer0_noise, layer0_speakerID))
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
614 layer0 = cross_entropy( reconstruct( hidden0, noise = 0.1))
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
615
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
616 and so on. Hopefully you got what I mean by spliting a transform, or zooming
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
617 in. When doing all this we did not change anything about the early stopping or
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
618 lunching jobs on the cluster. In the same manner, if one would like to look
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
619 into how jobs are send to the cluster, it could just expand that part. Note
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
620 that if we wanted to do something else we might have split the DAA
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
621 differently.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
622
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
623 The key of this approach is to identify such low level units that can be
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
624 shared by 90% of our architectures, and the splits that make most sense
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
625 from a functional point of view that will cover the main points where people
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
626 will like to change things. This will ensure that almost all the time we have
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
627 the wanted low-level bits that we want to write our code into, and most of the
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
628 time we will only work on one of that bit. There will definetely be cases when
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
629 whatever we have will not be sufficient or convinient. In that case some
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
630 effort has to be invested by the user to create a different decomposition of
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
631 the problem in the elements he need.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
632
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
633 I've been thinking about this a bit, and it definetely works in for deep
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
634 networks and theano ( the approach was inspired by theano). From what James
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
635 said, I think that other stuff might be possible to incorporate, at least as
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
636 atomic transforms if not in any other way.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
637
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
638 TODO: one has to give some thought of this low-level transform, to find a
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
639 suitable set of them ( and variables) so that would end up most of the time
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
640 re-using things and not creating new things.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
641
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
642 NOTES: there are some other implementation details missing of what this state
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
643 variables should contain. I did not want to clutter this with what tricks
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
644 could be used to get this transparent interface. I have a few of them in mind
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
645 though..
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
646 there is a lot of hardcoded values in this example. Usually each transform
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
647 that takes an input should "know" which of these inputs are tunable and mark
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
648 them as such. The order of the input in this example is important as well.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
649 This can be easily solved at the expense of a few more lines of code that
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
650 I did not want to write.
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
651
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
652
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
653
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
654
19033ef1636d some more details on my approach
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1055
diff changeset
655