Mercurial > pylearn
annotate doc/v2_planning/layer_RP.txt @ 1253:826d78f0135f
Prototype for "hooks" simpler than full control-flow rewrite.
author | Pascal Lamblin <lamblinp@iro.umontreal.ca> |
---|---|
date | Fri, 24 Sep 2010 01:46:12 -0400 |
parents | 32fc5f442dde |
children |
rev | line source |
---|---|
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
1 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
2 Layer committee |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
3 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
4 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
5 Members : RP, XG, AB, DWF |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
6 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
7 Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
8 ============= |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
9 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
10 You construct your neural network by constructing a graph of connections |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
11 between "layers" starting from data. While you construct the graph, |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
12 different theano formulas are put together to construct your model. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
13 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
14 The idea would be that you need to describe exactly what you would draw |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
15 on the board if you are asked to draw the architecture. This would be of |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
16 course optional ( you will get macros that will return this graph |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
17 automatically for a well defined case). Things that are not neural networks, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
18 and you wouldn't have any structure to draw are just a box. For example a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
19 SVM, or PCA. This in case you want to connect their output to your network. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
20 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
21 Hard details are not set yet, but all members of the committee agreed |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
22 that this sound as a good idea. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
23 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
24 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
25 Example Code (RP): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
26 ------------------ |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
27 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
28 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
29 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
30 h1 = sigmoid(dotW_b(train_x, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
31 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
32 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
33 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
34 h2 = sigmoid(dotW_b(h1, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
35 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
36 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
37 out = sigmoid( dotW_b(h2, n= 10)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
38 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
39 train_err = cross_entropy( out, train_y) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
40 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
41 grads = grad( train_err, err.parameters() ) |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
42 learner = SGD( err, err.parameters(), grads) |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
43 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
44 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
45 test_err = train_err.replace({ train_x : test_x , train_y : test_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
46 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
47 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
48 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
49 Global observations : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
50 --------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
51 |
1231 | 52 1) Your graph can have multiple terminal nodes; in this case rbm1, |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
53 rbm2 and learner, valid_err, test_err are all end nodes of the graph; |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
54 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
55 2) Any node is an "iterator", when you would call out.next() you would get |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
56 the next prediction; when you call err.next() you will get next error |
1231 | 57 ( on the batch given by the data.next() ). |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
58 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
59 3) Replace can replace any subgraph or subgraphs with other |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
60 subgraphs/subgraph as long as : there are the same number of input units |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
61 and output units ( there is a 1 to 1 maping from those). I see replacing |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
62 subgraphs as looping over the list of subgraphs to replace and call replace |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
63 on which nothing fancier. Since nodes in my view produce the same interface |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
64 (execpt parameter nodes and hyper-parameter nodes) this constraint is not |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
65 hard to respect, so is up to the user to do a replace that makes sense. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
66 |
1231 | 67 4) You can have MACROS or SUBROUTINE that already give you the graph for |
68 known components ( in my view the CDk is such a macro, but simpler | |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
69 examples will be vanilla versions of MLP, DAA, DBN, LOGREG). After |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
70 Guillaume pointed out a real shortcomming of the approach I've modified |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
71 a bit what you get from a macro .. look below. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
72 |
1231 | 73 5) Any node has the entire graph ( though arguably you don't use that |
74 graph too much). Running such a node in general will be done by compiling | |
75 the Theano expression up to that node( if you don't already have this | |
76 function), and using the data object that you get initially. This theano | |
77 function is compiled only if you need it. You use the graph only to : | |
78 * update the Theano expression in case some part of the subgraph has | |
79 changed (hyper-parameter or a replace call) | |
80 * collect the list of parameters of the model | |
81 * collect the list of hyper-parameters ( my personal view - this | |
82 would mostly be useful for a hyper learner .. and not for day to | |
83 day stuff, but I think is something easy to provide and we should ) | |
84 * collect constraints on parameters ( I believe they can be represented | |
85 in the graph as dependency links to other graphs that compute the | |
86 constraints..) | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
87 |
1231 | 88 6) Registering parameters and hyper-parameters to the graph is the job of |
89 the transform and therefore of the user who implemented that | |
90 transform; the same for initializing the parameters ( so if we have | |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
91 different ways to initialize the weight matrix that might be a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
92 hyperparameter with a default value or different transforms; to ease |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
93 the number of such transforms you can define a transform on the fly for |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
94 simple theano expressions ) |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
95 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
96 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
97 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
98 Detailed Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
99 ====================== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
100 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
101 I would go through a list of scenarios and possible issues : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
102 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
103 Delayed or feature values |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
104 ------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
105 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
106 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
107 This is can be dropped if people think is not useful. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
108 |
1231 | 109 Sometimes you might want future values of some nodes. For example you might |
110 be interested in : | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
111 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
112 y(t) = x(t) - x(t-1) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
113 |
1231 | 114 You can get that by having a "delayed" version of a node. A delayed version |
115 a node x is obtained by calling x.t(k) which will give you a node that has | |
116 the value x(t+k). k can be positive or negative. | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
117 In my view this can be done as follows : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
118 - a node is a class that points to : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
119 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
120 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
121 * the entire graph that describes the model ( not Theano graph !!!) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
122 The only thing you need to do is to change the data object to reflect the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
123 delay ( we might need to be able to pad it with 0?). You need also to create |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
124 a copy of the theano expression ( those are "new nodes" ) in the sense that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
125 the starting theano tensors are different since they point to different data. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
126 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
127 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
128 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
129 Non-theano transformation ( or function or whatever) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
130 ---------------------------------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
131 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
132 Maybe you want to do something in the middle of your graph that is not Theano |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
133 supported. Let say you have a function f which you can not write in Theano. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
134 You want to do something like |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
135 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
136 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
137 W1*f( W2*data + b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
138 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
139 I think we can support that by doing the following : |
1231 | 140 each node has a: |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
141 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
142 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
143 * the entire graph that describes the model |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
144 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
145 Let x1 = W2*data + b |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
146 up to here everything is fine ( we have a theano expression ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
147 dot(W2, tensor) + b, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
148 where tensor is provided by the data object ( plus a dict of givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
149 and whatever else you need to compile the function) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
150 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
151 When you apply f, what you do you create a node that is exactly like the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
152 data object in the sense that it provides a new tensor and a new dict of |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
153 givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
154 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
155 so x2 = W1*f( W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
156 will actually point to the expression |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
157 dot(W1, tensor) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
158 and to the data node f(W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
159 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
160 what this means is that you basically compile two theano functions t1 and t2 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
161 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
162 break the theano expression and start a new one. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
163 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
164 What you loose : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
165 - there is no optimization or anything between t1,t2 and f ( we don't |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
166 support that) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
167 - if you are running things on GPU, after t1, data will be copied on CPU and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
168 then probably again on GPU - so it doesn't make sense anymore |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
169 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
170 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
171 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
172 Recurrent Things |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
173 ---------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
174 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
175 I think that you can write a recurrent operation by first defining a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
176 graph ( the recrrent relation ): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
177 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
178 y_tm1 = recurrent_layer(init = zeros(50)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
179 x_t = slice(x, t=0) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
180 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
181 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
182 This would basically give all the information you need to add a scan op |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
183 to your theano expression of the result node y, it is just a different way |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
184 of writing things .. which I think is more intuitive. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
185 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
186 You create your primitives which are either a recurrent_layer that should |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
187 have a initial value, or a slice of some other node ( a time slice that is). |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
188 A tims slice is a special kind of node, which we should try to force people |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
189 not to use outside of a loop. If you use it though you have some default |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
190 behaviour like for example it behaves exactly like a delayed node. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
191 You call loop giving a expression that starts from those primitives and |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
192 ta da, you have your recurrent expression in the graph. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
193 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
194 Similarly you can have foldl or map or anything else. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
195 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
196 You would use this instead of writing scan especially if the formulas are |
1231 | 197 more complicated and you want to automatically collect parameters, |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
198 hyper-parameters and so on. You could also just use the scan op and |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
199 using a general apply command if you like that more. |
1231 | 200 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
201 Optimizer |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
202 --------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
203 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
204 Personally I would respect the findings of the optimization committee, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
205 and have the SGD to require a Node that produces some error ( which can |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
206 be omitted) and the parameter nodes and nodes that compute gradients for |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
207 those paramters. For this I would also have the grad function which would |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
208 actually only call T.grad. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
209 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
210 If you have non-theano thing in the middle? I don't have any smart |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
211 solution besides ignoring any parameter that it is below the first |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
212 non-theano node and throw a warning. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
213 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
214 Learner |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
215 ------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
216 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
217 In my case I would not have a predict() and eval() method of the learner, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
218 but just a eval(). If you want the predictions you should use the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
219 corresponding node ( before applying the error measure ). This was |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
220 for example **out** in my first example. Note eval() in this case is |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
221 the same as next(). ( you might just have next for simplicity). The |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
222 only semantically important difference is that a call to next has now |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
223 side-effects in the sense that the parameters are updated. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
224 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
225 Of course we could require learners to be special nodes that also have |
1231 | 226 a predict output. In that case I'm not sure what the iterating behaiour |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
227 of the node should produce. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
228 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
229 Granularity |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
230 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
231 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
232 Guillaume nicely pointed out that this library might be an overkill. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
233 In the sense that you have a dotW_b transform, and then you will need |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
234 a dotW_b_sparse transform and so on. Plus way of initializing each param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
235 would result in many more transforms. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
236 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
237 I don't have a perfect answer yet, but my argument will go as this : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
238 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
239 you would have transforms for the most popular option ( dotW_b) for example. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
240 If you need something else you can always decorate a function that takes |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
241 theano arguments and produces theano arguments. The formulas produced by |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
242 the formula committee might be a rich source of such function to decorate. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
243 More then decoratting, you can have a general apply transform that does |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
244 something like : |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
245 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
246 apply( lambda x,y,z: x*y+z, inputs = x, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
247 hyperparams = [(name,2)], |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
248 params = [(name,theano.shared(..)]) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
249 The order of the arguments in lambda is nodes, params, hyper-params or so. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
250 This would apply the theano expression but it will also register the |
1231 | 251 the parameters. It is like creating a transform on the fly. |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
252 You should, or could provide names for parameters, you might need them |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
253 later. |
1231 | 254 |
255 I think you can do such that the result of the apply is | |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
256 pickable, but not the general apply transform. What I mean is that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
257 the output node does not store the lambda expression but some theano |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
258 graph (?) and it know which are the input ( and when you can replace |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
259 them so that you link this little graph to the rest of the |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
260 theano expression. Is just an ugly hack given that you can not save |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
261 lambda expressions, but I'm open to other alternatives .. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
262 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
263 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
264 What this way of doing things would buy you hopefully is that you do not |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
265 need to worry about most of your model ( would be just a few macros) that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
266 will get you to the point you want to change and then you do surgery on |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
267 that point. Compare this with hacking a class, it feels cleaner, because |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
268 you what is up to that point you want to change is sort of separated from |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
269 what you change. Plus you could do this in your script, and you don't need |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
270 to create your local branch of the library where you hack the class, or |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
271 duplicate the class file under a different name .. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
272 Once what you are doing becomes stable it can be converted in either a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
273 different macro or a parameter to the initial macro. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
274 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
275 ** New part ** |
1231 | 276 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
277 If this is not convincing enough, there is another point that I want to |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
278 make. While creating the graph you can optionally create a model object. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
279 I will encourage most people to do that ! This idea I had a long time ago, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
280 but then I used a singleton class as the world which could potentially create |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
281 a lot of issues. This is a nicer version of that. |
1231 | 282 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
283 This model class is optional but it can be extremely useful. What you do in |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
284 this model class is to store the graph, together with different annotations |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
285 on that graph. What I would do is identify different subgraphs in the model |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
286 and register them under different names. For example if err is the node that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
287 points to the graph that represents a DBN, that graph will be registerd to |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
288 a model in which I have annotated which subgraphs represent the different |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
289 rbms, which represents the logistic regression and so on. The model will also |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
290 have a list of all the input nodes and all the output nodes of the graph. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
291 We could potentially use this model class to control some global default |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
292 parameters initialization or hyper-parameters. This all might sound like |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
293 magic but is actually easy to implement. |
1231 | 294 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
295 If you have such a model, which is just some annotations on the graph, this |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
296 approach makes it easy to change components of the graph based on their names. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
297 For example I can replace rbm1 with a daa, because based on these annotations |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
298 I know which part is rbm1. |
1231 | 299 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
300 Why do I feel you need such a thing? It is just because you get the DBN by |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
301 calling a macro, and you don't have variables that point to different nodes |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
302 of your network so that you can define where a subgraph starts or not. But |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
303 if a graph returns such a model, you can introspect what annotations you have. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
304 There should also be standard conventions, but you could also in the |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
305 interactive shell look at : |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
306 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
307 model.annotations(depth = 2) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
308 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
309 This would print something like : |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
310 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
311 'DBN' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
312 'rbm1' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
313 'hidden_layer1' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
314 'CDk_layer1' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
315 'rbm2' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
316 'hidden_layer2' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
317 'CDk_layer2' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
318 'logreg' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
319 'cross_entropy' |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
320 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
321 And then you can say |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
322 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
323 daa1 = daa(..) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
324 daa2 = daa(..) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
325 new_model = model.replace('rbm1', daa1, new_name = 'daa1') |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
326 new_model = new_model.replace('rbm2', daa2, new_name = 'daa2') |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
327 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
328 and you get a SDAA. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
329 What is the hierarhical structure ? Well, in my view if some subgrah |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
330 (annotated as S1) is part of another subgraph (annotated as S2) then |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
331 S1 is a child of S2 in this hierarchy of annotations. If they share |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
332 just a few nodes, but have nodes that are not shared, then they are on |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
333 the same level. We might one a flat space for the annotations, but I think |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
334 this simple convention can get as a lot. |
1231 | 335 |
336 | |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
337 So macros should in general return such models. It is up to you if you want to |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
338 ground the graph that you create in your script into a model or not. You do |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
339 so by manually adding nodes to the model. The annotations are also manually |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
340 done .. So this might be a bit annoying for a developer of a macro, but I |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
341 don't think is cognitively complicated, and it would help a lot when using |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
342 the macros. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
343 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
344 You can see how this annotation system becomes easily interesting. You can |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
345 also annotate parameters ( and it is not too overwhelming to do so when |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
346 you create the graph as well) and you can use this to sort of collect all |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
347 parameters that you annotated in some way and then do something to them. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
348 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
349 The way I see it is just that a transform could have an optional annotations |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
350 argument and it will add that string to all parameters and hyper-parameters. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
351 How much sense this makes is debatable, but I strongly believe that is not |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
352 complicated to implement ( I actually have something like this already |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
353 implemented, just that I use that single ton class, and I sort of made the |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
354 framework work mostly for DAA by making a few poor choices). |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
355 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
356 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
357 Params and hyperparams |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
358 ---------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
359 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
360 I think it is obvious from what I wrote above that there is a node wrapper |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
361 around the theano expression. I haven't wrote down all the details of that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
362 class. I think there should be such a wrapper around parameters and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
363 hyper-parameters as well. By default those wrappers might not provide |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
364 any informtion. But you can potentially add interesting information for |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
365 "graph" aware transforms. For example you can add annotations for a find |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
366 or replace function that will collect you all parameters or hyper-parameter |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
367 so you do some common thing to all of them (when it makes sense). |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
368 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
369 You could have a freeze property for parameters. If you change that property |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
370 the theano function (where needed) for all nodes that follow this one is |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
371 recomputed. This argument would be used by the collecting paramters function |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
372 used to compute the gradient. If parameters are frozen they are ignored, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
373 if not they are updated. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
374 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
375 For hyper-parameters you would also have a different wrapper that would |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
376 contain, possibly, the distribution of that hyper-parameters for a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
377 hyper-learner. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
378 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
379 I would also have the learning rate or noise_amounts as some strange |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
380 hyper-paramter. I would say by default, if any hyper-paramter changes its |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
381 value, then the theano expressions need to be recompiled. If you are dealing |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
382 with this strange types of hyper-parameters you don't need to do that. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
383 This can be automatically for you and I guess it will all boil down to, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
384 is you hyper-paramter a theano shared variable or theano tensor ? If so we |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
385 are dealing with the second type. So this kind of stuff can be detected |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
386 automatically. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
387 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
388 How does this work? |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
389 ------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
390 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
391 You always have a pointer to the entire graph. Whenever a hyper-param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
392 changes ( or a param freezes) all region of the graph affected get recompiled. |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
393 This is by traversing the graph from the bottom node and re-constructing the |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
394 theano expression. Where needed this theano expression get compiled. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
395 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
396 This function that updates / re-constructs the graph is sligthly more complex |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
397 if you have non-theano functions in the middle of the graph .. but not too |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
398 much in my view. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
399 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
400 replace & find |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
401 -------------- |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
402 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
403 Replace, replaces a part of the graph. The way it works in my view is that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
404 if I write : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
405 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
406 x = x1+x2+x3 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
407 y = x.replace({x2:x5}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
408 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
409 You would first copy the graph that is represented by x ( the params or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
410 hyper-params are not copied) and then replace the subgraphs. I.e., x will |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
411 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
412 inplace ! |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
413 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
414 I think these Node classes as something light-weighted, like theano variables |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
415 and creating copy is not harmful. Also params & shared variables are shared |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
416 between these graphs. If you want new params / shared variables we can offer |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
417 a copy / deepcopy command. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
418 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
419 Replace (given that it starts from a model) can take string(s) that indicate |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
420 specific annotations. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
421 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
422 Find does the same ( without the copying). |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
423 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
424 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
425 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
426 If you have two things named the same in the graph you would return the first |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
427 one in a breadth search starting from the top node. The idea is that if you |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
428 have all the weight matrices annotated as 'W' and you look for 'W' starting |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
429 from node hiddens2, you want the W of the second layer, and not of the first. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
430 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
431 I wold support : |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
432 model.replace( look_at , search_for , replace_with, annotate_as) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
433 replace(model , look_at , search_for , replace_with, annotate_as) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
434 node.replace(model , look_at, replace_with, annotate_as) |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
435 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
436 look_at if it is a node it reffers to the subgraph that has as a final |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
437 node that node. I.e. all up to that point. If it is a string, you would look |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
438 at the subgraph annotated by that string. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
439 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
440 Of course we can optionally choose not to allow things to be annotate with |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
441 the same name, though I sort of liked it. It makes a lot of things easy. For |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
442 a DBN I would have the annotations : |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
443 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
444 DBN |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
445 rbm1 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
446 hidden |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
447 CDk |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
448 rbm2 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
449 hidden |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
450 CDk |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
451 logreg |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
452 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
453 If I want to change the first CDk with PCD I would do |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
454 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
455 pcd1 = PCD (..) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
456 model.replace(look_at='rbm1', search_for='CDk', replace_with=pcd1, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
457 annotate_as='PCD1') |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
458 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
459 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
460 Bottom line is : |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
461 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
462 I think having a graph and having a way to search in that graph and replace |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
463 parts is a very flexible and powerful way of doing things. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
464 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
465 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
466 reconstruct |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
467 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
468 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
469 This is something nice for DAA. It is definetely not useful for the rest. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
470 I think though that is a shame having that transformation graph and not |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
471 being able to use it to do this. It will make life so much easier when you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
472 do deep auto-encoders. I wouldn't put it in the core library, but I would |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
473 have in the DAA module. For reconstruct to work you need to have inverse |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
474 transforms for the ones you use. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
475 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
476 The way I see it you can either have something like |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
477 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
478 # generate your inversable transforms on the fly |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
479 fn = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
480 inv = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
481 my_transform = couple_transforms( forward = fn, inv = inv) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
482 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
483 and generate special transforms on the fly that have some pseudo-inverses |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
484 when you construct the graph. Maybe you can also have spcific pre-defined |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
485 transforms for the most used cases, whith specific names. Even more I don't |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
486 see the harm of something as simple as dotW_b to have a inverse defined ( as |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
487 using tied weights) in all cases, but you would only use it for the DAA. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
488 It just to reduce the number of names of transforms you have, is like a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
489 feature that doesn't hurt or help in 95% of times but it helps in 5% of times. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
490 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
491 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
492 But this is up to debate. The only reason I bring it up is to say that the |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
493 class that represents a transform should have a inverse method that by |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
494 default throws an exception. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
495 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
496 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
497 transforms |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
498 ---------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
499 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
500 In my view there will be quite a few of such standard transforms. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
501 This can be annoying, but I think that if we group them by |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
502 architectures (MLP, DAA, RBM), sampler, optimizers it will be less of a mess. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
503 This would be crucial for their documentation as well. This categories should |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
504 also come with macros. There will be though some basic transforms that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
505 are available at the core ( like replace, find, things related to annotating |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
506 and creating a model, collecting parameters and hyper-paramters) |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
507 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
508 I also think that we can start small by having just very few such transforms |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
509 and add them as the library grows. We don't need many of this, most are |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
510 nice to have .. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
511 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
512 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
513 Constraints |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
514 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
515 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
516 You can always add constraints. I think the easier to make this explicit is to |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
517 get a hand on the parameter or ndoe on which you want to add constraint and |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
518 do something like |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
519 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
520 add_constraint(on_what, what) |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
521 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
522 on_what can be a node, a parameter node, a list of nodes, a list of parameter |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
523 nodes, an annotation string, given that you provided a model, and what is a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
524 graph. In terms of the graph that you are creating what this does is to |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
525 create a dependency link from your main graph to that constraint graph. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
526 This means that the grad function that computes the grad function that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
527 computes the gradients with respect to parameters will also (if there are |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
528 such dependency links) add the gradient of those parameters with respect |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
529 to the output of that dependency graph. There are some constraints on |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
530 what a dependency graph can be, in the sense that it should start from only |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
531 one input ( the parameters / node) and it should end in only one node that |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
532 is a scalar. |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
533 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
534 From an implementation point of view, this can be done by just collecting a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
535 list of constraints cost, that will be added to the cost before calling |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
536 T.grad. But I like to think about it in terms of graph linked through |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
537 dependency links. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
538 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
539 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
540 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
541 |
1237
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
542 Some general comments |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
543 --------------------- |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
544 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
545 I think that what you get in the end is a very flexible framework, where |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
546 adding new things is just a matter of putting together a few transforms and |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
547 annotating the entire thing. Worst case scenario you would need to invent a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
548 transform, which I do believe could be quite painless. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
549 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
550 The harder part to implement is the back-bone. It is not difficult in my |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
551 view, mostly sligthly tideous. I had something like this implemented in a |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
552 matter of a week, though it was a bit less restrictive. I do believe though |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
553 that we should not oversimplify the backbone of the library just to make it |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
554 easy to implement, but we should rather carefully consider what you get in |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
555 the end |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
556 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
557 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
558 Connection to the architecture committee |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
559 ----------------------------------------- |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
560 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
561 I think that if you get such iterator objects that can produce either |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
562 the error, or do an update step it is easy to wrap them in a plug-in, |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
563 or use it with the imperative language James proposed. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
564 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
565 I actually have ideas ( using non theano nodes) how to break the algo at |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
566 points such that you can have different parts run on remote machines .. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
567 though we might not want to support that ( using the plug-in system .. |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
568 though it might work with other systems that support the same idea) |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
569 |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
570 I think it goes more natural with the imperative language that James |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
571 proposed, because that would create a graph as well. His graph is |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
572 in general simpler ( it always has only one termination node) where |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
573 the nodes have a different interpretation (?) so I would use a different |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
574 node class on those. But from writing the code, using some syntactic sugar |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
575 the difference can be blurred ( do we want this ?). I think that one |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
576 can come up with ways of making the approaches look alike and sligtly |
32fc5f442dde
LAYER: sligthly long but somewhat clearer rendering of what I have in mind
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1231
diff
changeset
|
577 homogeneous. |