annotate doc/v2_planning/layer_RP.txt @ 1387:5a76d56be0bf

Merged
author Olivier Delalleau <delallea@iro>
date Tue, 14 Dec 2010 14:22:16 -0500
parents 32fc5f442dde
children
rev   line source
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
1 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
2 Layer committee
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
3 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
4
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
5 Members : RP, XG, AB, DWF
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
6
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
7 Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
8 =============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
9
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
10 You construct your neural network by constructing a graph of connections
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
11 between "layers" starting from data. While you construct the graph,
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
12 different theano formulas are put together to construct your model.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
13
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
14 The idea would be that you need to describe exactly what you would draw
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
15 on the board if you are asked to draw the architecture. This would be of
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
16 course optional ( you will get macros that will return this graph
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
17 automatically for a well defined case). Things that are not neural networks,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
18 and you wouldn't have any structure to draw are just a box. For example a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
19 SVM, or PCA. This in case you want to connect their output to your network.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
20
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
21 Hard details are not set yet, but all members of the committee agreed
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
22 that this sound as a good idea.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
23
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
24
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
25 Example Code (RP):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
26 ------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
27
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
28 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
29
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
30 h1 = sigmoid(dotW_b(train_x, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
31 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
32
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
33
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
34 h2 = sigmoid(dotW_b(h1, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
35 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
36
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
37 out = sigmoid( dotW_b(h2, n= 10))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
38
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
39 train_err = cross_entropy( out, train_y)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
40
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
41 grads = grad( train_err, err.parameters() )
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
42 learner = SGD( err, err.parameters(), grads)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
43
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
44 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
45 test_err = train_err.replace({ train_x : test_x , train_y : test_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
46
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
47
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
48
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
49 Global observations :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
50 ---------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
51
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
52 1) Your graph can have multiple terminal nodes; in this case rbm1,
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
53 rbm2 and learner, valid_err, test_err are all end nodes of the graph;
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
54
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
55 2) Any node is an "iterator", when you would call out.next() you would get
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
56 the next prediction; when you call err.next() you will get next error
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
57 ( on the batch given by the data.next() ).
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
58
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
59 3) Replace can replace any subgraph or subgraphs with other
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
60 subgraphs/subgraph as long as : there are the same number of input units
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
61 and output units ( there is a 1 to 1 maping from those). I see replacing
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
62 subgraphs as looping over the list of subgraphs to replace and call replace
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
63 on which nothing fancier. Since nodes in my view produce the same interface
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
64 (execpt parameter nodes and hyper-parameter nodes) this constraint is not
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
65 hard to respect, so is up to the user to do a replace that makes sense.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
66
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
67 4) You can have MACROS or SUBROUTINE that already give you the graph for
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
68 known components ( in my view the CDk is such a macro, but simpler
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
69 examples will be vanilla versions of MLP, DAA, DBN, LOGREG). After
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
70 Guillaume pointed out a real shortcomming of the approach I've modified
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
71 a bit what you get from a macro .. look below.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
72
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
73 5) Any node has the entire graph ( though arguably you don't use that
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
74 graph too much). Running such a node in general will be done by compiling
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
75 the Theano expression up to that node( if you don't already have this
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
76 function), and using the data object that you get initially. This theano
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
77 function is compiled only if you need it. You use the graph only to :
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
78 * update the Theano expression in case some part of the subgraph has
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
79 changed (hyper-parameter or a replace call)
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
80 * collect the list of parameters of the model
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
81 * collect the list of hyper-parameters ( my personal view - this
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
82 would mostly be useful for a hyper learner .. and not for day to
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
83 day stuff, but I think is something easy to provide and we should )
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
84 * collect constraints on parameters ( I believe they can be represented
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
85 in the graph as dependency links to other graphs that compute the
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
86 constraints..)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
87
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
88 6) Registering parameters and hyper-parameters to the graph is the job of
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
89 the transform and therefore of the user who implemented that
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
90 transform; the same for initializing the parameters ( so if we have
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
91 different ways to initialize the weight matrix that might be a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
92 hyperparameter with a default value or different transforms; to ease
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
93 the number of such transforms you can define a transform on the fly for
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
94 simple theano expressions )
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
95
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
96
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
97
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
98 Detailed Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
99 ======================
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
100
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
101 I would go through a list of scenarios and possible issues :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
102
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
103 Delayed or feature values
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
104 -------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
105
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
106
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
107 This is can be dropped if people think is not useful.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
108
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
109 Sometimes you might want future values of some nodes. For example you might
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
110 be interested in :
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
111
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
112 y(t) = x(t) - x(t-1)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
113
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
114 You can get that by having a "delayed" version of a node. A delayed version
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
115 a node x is obtained by calling x.t(k) which will give you a node that has
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
116 the value x(t+k). k can be positive or negative.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
117 In my view this can be done as follows :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
118 - a node is a class that points to :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
119 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
120 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
121 * the entire graph that describes the model ( not Theano graph !!!)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
122 The only thing you need to do is to change the data object to reflect the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
123 delay ( we might need to be able to pad it with 0?). You need also to create
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
124 a copy of the theano expression ( those are "new nodes" ) in the sense that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
125 the starting theano tensors are different since they point to different data.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
126
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
127
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
128
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
129 Non-theano transformation ( or function or whatever)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
130 ----------------------------------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
131
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
132 Maybe you want to do something in the middle of your graph that is not Theano
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
133 supported. Let say you have a function f which you can not write in Theano.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
134 You want to do something like
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
135
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
136
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
137 W1*f( W2*data + b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
138
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
139 I think we can support that by doing the following :
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
140 each node has a:
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
141 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
142 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
143 * the entire graph that describes the model
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
144
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
145 Let x1 = W2*data + b
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
146 up to here everything is fine ( we have a theano expression )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
147 dot(W2, tensor) + b,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
148 where tensor is provided by the data object ( plus a dict of givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
149 and whatever else you need to compile the function)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
150
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
151 When you apply f, what you do you create a node that is exactly like the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
152 data object in the sense that it provides a new tensor and a new dict of
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
153 givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
154
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
155 so x2 = W1*f( W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
156 will actually point to the expression
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
157 dot(W1, tensor)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
158 and to the data node f(W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
159
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
160 what this means is that you basically compile two theano functions t1 and t2
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
161 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
162 break the theano expression and start a new one.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
163
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
164 What you loose :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
165 - there is no optimization or anything between t1,t2 and f ( we don't
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
166 support that)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
167 - if you are running things on GPU, after t1, data will be copied on CPU and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
168 then probably again on GPU - so it doesn't make sense anymore
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
169
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
170
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
171
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
172 Recurrent Things
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
173 ----------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
174
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
175 I think that you can write a recurrent operation by first defining a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
176 graph ( the recrrent relation ):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
177
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
178 y_tm1 = recurrent_layer(init = zeros(50))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
179 x_t = slice(x, t=0)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
180 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
181
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
182 This would basically give all the information you need to add a scan op
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
183 to your theano expression of the result node y, it is just a different way
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
184 of writing things .. which I think is more intuitive.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
185
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
186 You create your primitives which are either a recurrent_layer that should
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
187 have a initial value, or a slice of some other node ( a time slice that is).
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
188 A tims slice is a special kind of node, which we should try to force people
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
189 not to use outside of a loop. If you use it though you have some default
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
190 behaviour like for example it behaves exactly like a delayed node.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
191 You call loop giving a expression that starts from those primitives and
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
192 ta da, you have your recurrent expression in the graph.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
193
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
194 Similarly you can have foldl or map or anything else.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
195
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
196 You would use this instead of writing scan especially if the formulas are
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
197 more complicated and you want to automatically collect parameters,
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
198 hyper-parameters and so on. You could also just use the scan op and
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
199 using a general apply command if you like that more.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
200
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
201 Optimizer
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
202 ---------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
203
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
204 Personally I would respect the findings of the optimization committee,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
205 and have the SGD to require a Node that produces some error ( which can
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
206 be omitted) and the parameter nodes and nodes that compute gradients for
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
207 those paramters. For this I would also have the grad function which would
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
208 actually only call T.grad.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
209
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
210 If you have non-theano thing in the middle? I don't have any smart
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
211 solution besides ignoring any parameter that it is below the first
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
212 non-theano node and throw a warning.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
213
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
214 Learner
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
215 -------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
216
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
217 In my case I would not have a predict() and eval() method of the learner,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
218 but just a eval(). If you want the predictions you should use the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
219 corresponding node ( before applying the error measure ). This was
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
220 for example **out** in my first example. Note eval() in this case is
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
221 the same as next(). ( you might just have next for simplicity). The
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
222 only semantically important difference is that a call to next has now
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
223 side-effects in the sense that the parameters are updated.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
224
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
225 Of course we could require learners to be special nodes that also have
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
226 a predict output. In that case I'm not sure what the iterating behaiour
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
227 of the node should produce.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
228
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
229 Granularity
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
230 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
231
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
232 Guillaume nicely pointed out that this library might be an overkill.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
233 In the sense that you have a dotW_b transform, and then you will need
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
234 a dotW_b_sparse transform and so on. Plus way of initializing each param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
235 would result in many more transforms.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
236
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
237 I don't have a perfect answer yet, but my argument will go as this :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
238
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
239 you would have transforms for the most popular option ( dotW_b) for example.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
240 If you need something else you can always decorate a function that takes
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
241 theano arguments and produces theano arguments. The formulas produced by
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
242 the formula committee might be a rich source of such function to decorate.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
243 More then decoratting, you can have a general apply transform that does
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
244 something like :
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
245
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
246 apply( lambda x,y,z: x*y+z, inputs = x,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
247 hyperparams = [(name,2)],
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
248 params = [(name,theano.shared(..)])
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
249 The order of the arguments in lambda is nodes, params, hyper-params or so.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
250 This would apply the theano expression but it will also register the
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
251 the parameters. It is like creating a transform on the fly.
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
252 You should, or could provide names for parameters, you might need them
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
253 later.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
254
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
255 I think you can do such that the result of the apply is
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
256 pickable, but not the general apply transform. What I mean is that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
257 the output node does not store the lambda expression but some theano
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
258 graph (?) and it know which are the input ( and when you can replace
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
259 them so that you link this little graph to the rest of the
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
260 theano expression. Is just an ugly hack given that you can not save
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
261 lambda expressions, but I'm open to other alternatives ..
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
262
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
263
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
264 What this way of doing things would buy you hopefully is that you do not
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
265 need to worry about most of your model ( would be just a few macros) that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
266 will get you to the point you want to change and then you do surgery on
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
267 that point. Compare this with hacking a class, it feels cleaner, because
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
268 you what is up to that point you want to change is sort of separated from
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
269 what you change. Plus you could do this in your script, and you don't need
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
270 to create your local branch of the library where you hack the class, or
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
271 duplicate the class file under a different name ..
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
272 Once what you are doing becomes stable it can be converted in either a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
273 different macro or a parameter to the initial macro.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
274
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
275 ** New part **
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
276
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
277 If this is not convincing enough, there is another point that I want to
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
278 make. While creating the graph you can optionally create a model object.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
279 I will encourage most people to do that ! This idea I had a long time ago,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
280 but then I used a singleton class as the world which could potentially create
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
281 a lot of issues. This is a nicer version of that.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
282
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
283 This model class is optional but it can be extremely useful. What you do in
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
284 this model class is to store the graph, together with different annotations
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
285 on that graph. What I would do is identify different subgraphs in the model
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
286 and register them under different names. For example if err is the node that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
287 points to the graph that represents a DBN, that graph will be registerd to
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
288 a model in which I have annotated which subgraphs represent the different
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
289 rbms, which represents the logistic regression and so on. The model will also
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
290 have a list of all the input nodes and all the output nodes of the graph.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
291 We could potentially use this model class to control some global default
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
292 parameters initialization or hyper-parameters. This all might sound like
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
293 magic but is actually easy to implement.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
294
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
295 If you have such a model, which is just some annotations on the graph, this
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
296 approach makes it easy to change components of the graph based on their names.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
297 For example I can replace rbm1 with a daa, because based on these annotations
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
298 I know which part is rbm1.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
299
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
300 Why do I feel you need such a thing? It is just because you get the DBN by
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
301 calling a macro, and you don't have variables that point to different nodes
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
302 of your network so that you can define where a subgraph starts or not. But
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
303 if a graph returns such a model, you can introspect what annotations you have.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
304 There should also be standard conventions, but you could also in the
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
305 interactive shell look at :
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
306
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
307 model.annotations(depth = 2)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
308
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
309 This would print something like :
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
310
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
311 'DBN'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
312 'rbm1'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
313 'hidden_layer1'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
314 'CDk_layer1'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
315 'rbm2'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
316 'hidden_layer2'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
317 'CDk_layer2'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
318 'logreg'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
319 'cross_entropy'
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
320
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
321 And then you can say
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
322
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
323 daa1 = daa(..)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
324 daa2 = daa(..)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
325 new_model = model.replace('rbm1', daa1, new_name = 'daa1')
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
326 new_model = new_model.replace('rbm2', daa2, new_name = 'daa2')
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
327
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
328 and you get a SDAA.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
329 What is the hierarhical structure ? Well, in my view if some subgrah
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
330 (annotated as S1) is part of another subgraph (annotated as S2) then
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
331 S1 is a child of S2 in this hierarchy of annotations. If they share
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
332 just a few nodes, but have nodes that are not shared, then they are on
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
333 the same level. We might one a flat space for the annotations, but I think
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
334 this simple convention can get as a lot.
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
335
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
336
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
337 So macros should in general return such models. It is up to you if you want to
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
338 ground the graph that you create in your script into a model or not. You do
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
339 so by manually adding nodes to the model. The annotations are also manually
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
340 done .. So this might be a bit annoying for a developer of a macro, but I
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
341 don't think is cognitively complicated, and it would help a lot when using
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
342 the macros.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
343
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
344 You can see how this annotation system becomes easily interesting. You can
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
345 also annotate parameters ( and it is not too overwhelming to do so when
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
346 you create the graph as well) and you can use this to sort of collect all
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
347 parameters that you annotated in some way and then do something to them.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
348
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
349 The way I see it is just that a transform could have an optional annotations
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
350 argument and it will add that string to all parameters and hyper-parameters.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
351 How much sense this makes is debatable, but I strongly believe that is not
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
352 complicated to implement ( I actually have something like this already
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
353 implemented, just that I use that single ton class, and I sort of made the
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
354 framework work mostly for DAA by making a few poor choices).
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
355
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
356
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
357 Params and hyperparams
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
358 ----------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
359
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
360 I think it is obvious from what I wrote above that there is a node wrapper
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
361 around the theano expression. I haven't wrote down all the details of that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
362 class. I think there should be such a wrapper around parameters and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
363 hyper-parameters as well. By default those wrappers might not provide
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
364 any informtion. But you can potentially add interesting information for
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
365 "graph" aware transforms. For example you can add annotations for a find
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
366 or replace function that will collect you all parameters or hyper-parameter
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
367 so you do some common thing to all of them (when it makes sense).
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
368
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
369 You could have a freeze property for parameters. If you change that property
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
370 the theano function (where needed) for all nodes that follow this one is
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
371 recomputed. This argument would be used by the collecting paramters function
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
372 used to compute the gradient. If parameters are frozen they are ignored,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
373 if not they are updated.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
374
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
375 For hyper-parameters you would also have a different wrapper that would
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
376 contain, possibly, the distribution of that hyper-parameters for a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
377 hyper-learner.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
378
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
379 I would also have the learning rate or noise_amounts as some strange
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
380 hyper-paramter. I would say by default, if any hyper-paramter changes its
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
381 value, then the theano expressions need to be recompiled. If you are dealing
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
382 with this strange types of hyper-parameters you don't need to do that.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
383 This can be automatically for you and I guess it will all boil down to,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
384 is you hyper-paramter a theano shared variable or theano tensor ? If so we
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
385 are dealing with the second type. So this kind of stuff can be detected
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
386 automatically.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
387
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
388 How does this work?
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
389 -------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
390
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
391 You always have a pointer to the entire graph. Whenever a hyper-param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
392 changes ( or a param freezes) all region of the graph affected get recompiled.
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
393 This is by traversing the graph from the bottom node and re-constructing the
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
394 theano expression. Where needed this theano expression get compiled.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
395
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
396 This function that updates / re-constructs the graph is sligthly more complex
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
397 if you have non-theano functions in the middle of the graph .. but not too
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
398 much in my view.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
399
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
400 replace & find
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
401 --------------
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
402
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
403 Replace, replaces a part of the graph. The way it works in my view is that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
404 if I write :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
405
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
406 x = x1+x2+x3
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
407 y = x.replace({x2:x5})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
408
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
409 You would first copy the graph that is represented by x ( the params or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
410 hyper-params are not copied) and then replace the subgraphs. I.e., x will
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
411 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
412 inplace !
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
413
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
414 I think these Node classes as something light-weighted, like theano variables
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
415 and creating copy is not harmful. Also params & shared variables are shared
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
416 between these graphs. If you want new params / shared variables we can offer
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
417 a copy / deepcopy command.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
418
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
419 Replace (given that it starts from a model) can take string(s) that indicate
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
420 specific annotations.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
421
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
422 Find does the same ( without the copying).
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
423
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
424
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
425
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
426 If you have two things named the same in the graph you would return the first
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
427 one in a breadth search starting from the top node. The idea is that if you
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
428 have all the weight matrices annotated as 'W' and you look for 'W' starting
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
429 from node hiddens2, you want the W of the second layer, and not of the first.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
430
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
431 I wold support :
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
432 model.replace( look_at , search_for , replace_with, annotate_as)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
433 replace(model , look_at , search_for , replace_with, annotate_as)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
434 node.replace(model , look_at, replace_with, annotate_as)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
435
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
436 look_at if it is a node it reffers to the subgraph that has as a final
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
437 node that node. I.e. all up to that point. If it is a string, you would look
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
438 at the subgraph annotated by that string.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
439
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
440 Of course we can optionally choose not to allow things to be annotate with
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
441 the same name, though I sort of liked it. It makes a lot of things easy. For
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
442 a DBN I would have the annotations :
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
443
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
444 DBN
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
445 rbm1
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
446 hidden
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
447 CDk
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
448 rbm2
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
449 hidden
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
450 CDk
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
451 logreg
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
452
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
453 If I want to change the first CDk with PCD I would do
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
454
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
455 pcd1 = PCD (..)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
456 model.replace(look_at='rbm1', search_for='CDk', replace_with=pcd1,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
457 annotate_as='PCD1')
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
458
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
459
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
460 Bottom line is :
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
461
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
462 I think having a graph and having a way to search in that graph and replace
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
463 parts is a very flexible and powerful way of doing things.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
464
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
465
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
466 reconstruct
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
467 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
468
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
469 This is something nice for DAA. It is definetely not useful for the rest.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
470 I think though that is a shame having that transformation graph and not
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
471 being able to use it to do this. It will make life so much easier when you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
472 do deep auto-encoders. I wouldn't put it in the core library, but I would
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
473 have in the DAA module. For reconstruct to work you need to have inverse
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
474 transforms for the ones you use.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
475
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
476 The way I see it you can either have something like
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
477
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
478 # generate your inversable transforms on the fly
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
479 fn = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
480 inv = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
481 my_transform = couple_transforms( forward = fn, inv = inv)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
482
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
483 and generate special transforms on the fly that have some pseudo-inverses
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
484 when you construct the graph. Maybe you can also have spcific pre-defined
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
485 transforms for the most used cases, whith specific names. Even more I don't
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
486 see the harm of something as simple as dotW_b to have a inverse defined ( as
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
487 using tied weights) in all cases, but you would only use it for the DAA.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
488 It just to reduce the number of names of transforms you have, is like a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
489 feature that doesn't hurt or help in 95% of times but it helps in 5% of times.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
490
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
491
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
492 But this is up to debate. The only reason I bring it up is to say that the
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
493 class that represents a transform should have a inverse method that by
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
494 default throws an exception.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
495
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
496
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
497 transforms
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
498 ----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
499
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
500 In my view there will be quite a few of such standard transforms.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
501 This can be annoying, but I think that if we group them by
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
502 architectures (MLP, DAA, RBM), sampler, optimizers it will be less of a mess.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
503 This would be crucial for their documentation as well. This categories should
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
504 also come with macros. There will be though some basic transforms that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
505 are available at the core ( like replace, find, things related to annotating
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
506 and creating a model, collecting parameters and hyper-paramters)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
507
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
508 I also think that we can start small by having just very few such transforms
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
509 and add them as the library grows. We don't need many of this, most are
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
510 nice to have ..
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
511
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
512
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
513 Constraints
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
514 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
515
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
516 You can always add constraints. I think the easier to make this explicit is to
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
517 get a hand on the parameter or ndoe on which you want to add constraint and
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
518 do something like
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
519
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
520 add_constraint(on_what, what)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
521
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
522 on_what can be a node, a parameter node, a list of nodes, a list of parameter
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
523 nodes, an annotation string, given that you provided a model, and what is a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
524 graph. In terms of the graph that you are creating what this does is to
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
525 create a dependency link from your main graph to that constraint graph.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
526 This means that the grad function that computes the grad function that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
527 computes the gradients with respect to parameters will also (if there are
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
528 such dependency links) add the gradient of those parameters with respect
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
529 to the output of that dependency graph. There are some constraints on
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
530 what a dependency graph can be, in the sense that it should start from only
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
531 one input ( the parameters / node) and it should end in only one node that
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
532 is a scalar.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
533
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
534 From an implementation point of view, this can be done by just collecting a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
535 list of constraints cost, that will be added to the cost before calling
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
536 T.grad. But I like to think about it in terms of graph linked through
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
537 dependency links.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
538
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
539
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
540
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
541
1237
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
542 Some general comments
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
543 ---------------------
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
544
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
545 I think that what you get in the end is a very flexible framework, where
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
546 adding new things is just a matter of putting together a few transforms and
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
547 annotating the entire thing. Worst case scenario you would need to invent a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
548 transform, which I do believe could be quite painless.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
549
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
550 The harder part to implement is the back-bone. It is not difficult in my
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
551 view, mostly sligthly tideous. I had something like this implemented in a
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
552 matter of a week, though it was a bit less restrictive. I do believe though
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
553 that we should not oversimplify the backbone of the library just to make it
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
554 easy to implement, but we should rather carefully consider what you get in
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
555 the end
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
556
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
557
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
558 Connection to the architecture committee
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
559 -----------------------------------------
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
560
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
561 I think that if you get such iterator objects that can produce either
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
562 the error, or do an update step it is easy to wrap them in a plug-in,
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
563 or use it with the imperative language James proposed.
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
564
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
565 I actually have ideas ( using non theano nodes) how to break the algo at
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
566 points such that you can have different parts run on remote machines ..
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
567 though we might not want to support that ( using the plug-in system ..
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
568 though it might work with other systems that support the same idea)
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
569
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
570 I think it goes more natural with the imperative language that James
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
571 proposed, because that would create a graph as well. His graph is
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
572 in general simpler ( it always has only one termination node) where
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
573 the nodes have a different interpretation (?) so I would use a different
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
574 node class on those. But from writing the code, using some syntactic sugar
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
575 the difference can be blurred ( do we want this ?). I think that one
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
576 can come up with ways of making the approaches look alike and sligtly
32fc5f442dde LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1231
diff changeset
577 homogeneous.