Mercurial > pylearn
annotate doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b
some typos
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Wed, 22 Sep 2010 20:17:35 -0400 |
parents | 515033d4d3bf |
children | 32fc5f442dde |
rev | line source |
---|---|
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
1 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
2 Layer committee |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
3 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
4 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
5 Members : RP, XG, AB, DWF |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
6 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
7 Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
8 ============= |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
9 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
10 You construct your neural network by constructing a graph of connections |
1231 | 11 between layers starting from data. While you construct the graph, |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
12 different theano formulas are put together to construct your model. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
13 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
14 Hard details are not set yet, but all members of the committee agreed |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
15 that this sound as a good idea. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
16 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
17 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
18 Example Code (RP): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
19 ------------------ |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
20 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
22 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
23 h1 = sigmoid(dotW_b(train_x, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
25 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
26 h2 = sigmoid(dotW_b(h1, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
28 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
29 out = sigmoid( dotW_b(h2, n= 10)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
30 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
31 train_err = cross_entropy( out, train_y) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
32 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
33 grads = grad( train_err, err.parameters() ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
34 learner = SGD( err, grads) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
35 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
37 test_err = train_err.replace({ train_x : test_x , train_y : test_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
38 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
39 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
40 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
41 Global observations : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
42 --------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
43 |
1231 | 44 1) Your graph can have multiple terminal nodes; in this case rbm1, |
45 rbm2 and learner, valid_err, test_err are all end nodes of the graph; | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
46 |
1231 | 47 2) Any node is an "iterator", when you would call out.next() you would get |
48 the next prediction; when you call err.next() you will get next error | |
49 ( on the batch given by the data.next() ). | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
50 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
51 3) Replace can replace any subgraph |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
52 |
1231 | 53 4) You can have MACROS or SUBROUTINE that already give you the graph for |
54 known components ( in my view the CDk is such a macro, but simpler | |
55 examples will be vanilla versions of MLP, DAA, DBN, LOGREG) | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
56 |
1231 | 57 5) Any node has the entire graph ( though arguably you don't use that |
58 graph too much). Running such a node in general will be done by compiling | |
59 the Theano expression up to that node( if you don't already have this | |
60 function), and using the data object that you get initially. This theano | |
61 function is compiled only if you need it. You use the graph only to : | |
62 * update the Theano expression in case some part of the subgraph has | |
63 changed (hyper-parameter or a replace call) | |
64 * collect the list of parameters of the model | |
65 * collect the list of hyper-parameters ( my personal view - this | |
66 would mostly be useful for a hyper learner .. and not for day to | |
67 day stuff, but I think is something easy to provide and we should ) | |
68 * collect constraints on parameters ( I believe they can be represented | |
69 in the graph as dependency links to other graphs that compute the | |
70 constraints..) | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
71 |
1231 | 72 6) Registering parameters and hyper-parameters to the graph is the job of |
73 the transform and therefore of the user who implemented that | |
74 transform; the same for initializing the parameters ( so if we have | |
75 different way to initialize the weight matrix that might be a | |
76 hyperparameter with a default value) | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
77 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
78 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
79 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
80 Detailed Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
81 ====================== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
82 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
83 I would go through a list of scenarios and possible issues : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
84 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
85 Delayed or feature values |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
86 ------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
87 |
1231 | 88 Sometimes you might want future values of some nodes. For example you might |
89 be interested in : | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
90 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
91 y(t) = x(t) - x(t-1) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
92 |
1231 | 93 You can get that by having a "delayed" version of a node. A delayed version |
94 a node x is obtained by calling x.t(k) which will give you a node that has | |
95 the value x(t+k). k can be positive or negative. | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
96 In my view this can be done as follows : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
97 - a node is a class that points to : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
98 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
99 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
100 * the entire graph that describes the model ( not Theano graph !!!) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
101 The only thing you need to do is to change the data object to reflect the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
102 delay ( we might need to be able to pad it with 0?). You need also to create |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
103 a copy of the theano expression ( those are "new nodes" ) in the sense that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
104 the starting theano tensors are different since they point to different data. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
105 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
106 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
107 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
108 Non-theano transformation ( or function or whatever) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
109 ---------------------------------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
110 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
111 Maybe you want to do something in the middle of your graph that is not Theano |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
112 supported. Let say you have a function f which you can not write in Theano. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
113 You want to do something like |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
114 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
115 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
116 W1*f( W2*data + b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
117 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
118 I think we can support that by doing the following : |
1231 | 119 each node has a: |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
120 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
121 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
122 * the entire graph that describes the model |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
123 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
124 Let x1 = W2*data + b |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
125 up to here everything is fine ( we have a theano expression ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
126 dot(W2, tensor) + b, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
127 where tensor is provided by the data object ( plus a dict of givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
128 and whatever else you need to compile the function) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
129 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
130 When you apply f, what you do you create a node that is exactly like the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
131 data object in the sense that it provides a new tensor and a new dict of |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
132 givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
133 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
134 so x2 = W1*f( W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
135 will actually point to the expression |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
136 dot(W1, tensor) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
137 and to the data node f(W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
138 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
139 what this means is that you basically compile two theano functions t1 and t2 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
140 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
141 break the theano expression and start a new one. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
142 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
143 What you loose : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
144 - there is no optimization or anything between t1,t2 and f ( we don't |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
145 support that) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
146 - if you are running things on GPU, after t1, data will be copied on CPU and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
147 then probably again on GPU - so it doesn't make sense anymore |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
148 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
149 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
150 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
151 Recurrent Things |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
152 ---------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
153 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
154 I think that you can write a recurrent operation by first defining a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
155 graph ( the recrrent relation ): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
156 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
157 y_tm1 = recurrent_layer(init = zeros(50)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
158 x_t = slice(x, t=0) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
159 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
160 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
161 This would basically give all the information you need to add a scan op |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
162 to your theano expression of the result op, it is just a different way |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
163 of writing things .. which I think is more intuitive. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
164 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
165 You create your primitives which are either a recurrent_layer that should |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
166 have a initial value, or a slice of some other node ( a time slice that is) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
167 Then you call loop giving a expression that starts from those primitives. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
168 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
169 Similarly you can have foldl or map or anything else. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
170 |
1231 | 171 You would use this instead of writing scan especially if the formula is |
172 more complicated and you want to automatically collect parameters, | |
173 hyper-parameters and so on. | |
174 | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
175 Optimizer |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
176 --------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
177 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
178 Personally I would respect the findings of the optimization committee, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
179 and have the SGD to require a Node that produces some error ( which can |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
180 be omitted) and the gradients. For this I would also have the grad |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
181 function which would actually only call T.grad. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
182 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
183 If you have non-theano thing in the middle? I don't have any smart |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
184 solution besides ignoring any parameter that it is below the first |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
185 non-theano node and throw a warning. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
186 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
187 Learner |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
188 ------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
189 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
190 In my case I would not have a predict() and eval() method of the learner, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
191 but just a eval(). If you want the predictions you should use the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
192 corresponding node ( before applying the error measure ). This was |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
193 for example **out** in my first example. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
194 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
195 Of course we could require learners to be special nodes that also have |
1231 | 196 a predict output. In that case I'm not sure what the iterating behaiour |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
197 of the node should produce. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
198 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
199 Granularity |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
200 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
201 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
202 Guillaume nicely pointed out that this library might be an overkill. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
203 In the sense that you have a dotW_b transform, and then you will need |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
204 a dotW_b_sparse transform and so on. Plus way of initializing each param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
205 would result in many more transforms. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
206 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
207 I don't have a perfect answer yet, but my argument will go as this : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
208 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
209 you would have transforms for the most popular option ( dotW_b) for example. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
210 If you need something else you can always decorate a function that takes |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
211 theano arguments and produces theano arguments. More then decoratting you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
212 can have a general apply transform that does something like : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
213 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
214 apply( lambda x,y,z: x*y+z, inputs = x, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
215 hyperparams = [(name,2)], |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
216 params = [(name,theano.shared(..)]) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
217 The order of the arguments in lambda is nodes, params, hyper-params or so. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
218 This would apply the theano expression but it will also register the |
1231 | 219 the parameters. It is like creating a transform on the fly. |
220 | |
221 I think you can do such that the result of the apply is | |
222 pickable, but not the apply operation. Meaning that in the graph, the op | |
223 doesn't actually store the lambda expression but a mini theano graph. | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
224 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
225 Also names might be optional, so you can write hyperparam = [2,] |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
226 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
227 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
228 What this way of doing things would buy you hopefully is that you do not |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
229 need to worry about most of your model ( would be just a few macros or |
1231 | 230 subrutines). |
231 you would do something like : | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
232 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
233 rbm1,hidden1 = rbm_layer(data,20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
234 rbm2,hidden2 = rbm_layer(data,20) |
1231 | 235 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
236 and then the part you care about : |
1231 | 237 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
238 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params = |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
239 theano.shared(scipy.sparse_CSR(..))) |
1231 | 240 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
241 and after that you pottentially still do what you did before : |
1231 | 242 |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
243 err = cross_entropy(hidden3, target) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
244 grads = grad(err, err.paramters()) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
245 ... |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
246 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
247 I do agree that some of the "transforms" that I have been writing here |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
248 and there are pretty low level, and maybe we don't need them. We might need |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
249 only somewhat higher level transforms. My hope is that for now people think |
1231 | 250 of the approach and not about all inner details ( like what transforms we |
251 need and so on) and see if they are comfortable with it or not. | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
252 |
1231 | 253 Do we want to think in this terms? I think is a bit better do have |
254 a normal python class, hacking it to change something and then either add | |
255 a parameter to init or create a new version. It seems a bit more natural. | |
256 | |
257 | |
258 | |
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
259 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
260 Anyhow Guillaume I'm working on a better answer :) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
261 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
262 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
263 Params and hyperparams |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
264 ---------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
265 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
266 I think it is obvious from what I wrote above that there is a node wrapper |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
267 around the theano expression. I haven't wrote down all the details of that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
268 class. I think there should be such a wrapper around parameters and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
269 hyper-parameters as well. By default those wrappers might not provide |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
270 any informtion. Later on, they can provide for hyper-params for example a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
271 distribution. If when inserting your hyper-param in the graph ( i.e. when |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
272 you call a given transform) you provide the distribution then maybe a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
273 hyperlearner could use it to sample from it. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
274 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
275 For parameters you might define properties like freeze. It can be true or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
276 false. Whenever it is set to true, the param is not adapted by the optimizer. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
277 Changing this value like changing most of hyper-params implies recompilation |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
278 of the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
279 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
280 I would have a special class of hyper-params which don't require |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
281 recompilation of the graph. Learning rate is an example. This info is also |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
282 given by the wrapper and by how the parameter is used. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
283 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
284 It is up to the user and "transform" implementer to wrap params and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
285 hyper-params correspondingly. But I don't think this is to complicated. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
286 The apply function above has a default behaviour, maybe you would have |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
287 a forth type of argument which is hyper-param that doesn't require |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
288 compilation. We could find a nice name for it. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
289 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
290 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
291 How does this work? |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
292 ------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
293 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
294 You always have a pointer to the entire graph. Whenever a hyper-param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
295 changes ( or a param freezes) all region of the graph affected get recompiled. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
296 This is by traversing the graph from the bottom node and constructing the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
297 theano expression. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
298 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
299 This function that updates / re-constructs the graph is sligthly more complex |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
300 if you have non-theano functions in the graph .. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
301 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
302 replace |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
303 ------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
304 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
305 Replace, replaces a part of the graph. The way it works in my view is that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
306 if I write : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
307 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
308 x = x1+x2+x3 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
309 y = x.replace({x2:x5}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
310 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
311 You would first copy the graph that is represented by x ( the params or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
312 hyper-params are not copied) and then replace the subgraphs. I.e., x will |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
313 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
314 inplace. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
315 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
316 I think these Node classes as something light-weighted, like theano variables. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
317 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
318 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
319 reconstruct |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
320 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
321 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
322 This is something nice for DAA. It is definetely not useful for the rest. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
323 I think though that is a shame having that transformation graph and not |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
324 being able to use it to do this. It will make life so much easier when you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
325 do deep auto-encoders. I wouldn't put it in the core library, but I would |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
326 have in the DAA module. The way I see it you can either have something like |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
327 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
328 # generate your inversable transforms on the fly |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
329 fn = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
330 inv = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
331 my_transform = couple_transforms( forward = fn, inv = inv) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
332 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
333 # have some already widely used such transform in the daa submodule. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
334 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
335 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
336 transforms |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
337 ---------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
338 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
339 In my view there will be quite a few of such standard transforms. They |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
340 can be grouped by architecture, basic, sampler, optimizer and so on. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
341 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
342 We do not need to provide all of them, just the ones we need. Researching |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
343 on an architecture would actually lead in creating new such transforms in |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
344 the library. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
345 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
346 There will be definetely a list of basic such transforms in the begining, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
347 like : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
348 replace, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
349 search, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
350 get_param(name) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
351 get_params(..) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
352 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
353 You can have and should have something like a switch ( that based on a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
354 hyper parameter replaces a part of a graph with another or not). This is |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
355 done by re-compiling the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
356 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
357 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
358 Constraints |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
359 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
360 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
361 Nodes also can also keep track of constraints. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
362 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
363 When you write |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
364 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
365 y = add_constraint(x, sum(x**2)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
366 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
367 y is the same node as x, just that it also links to this second graph that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
368 computes constraints. Whenever you call grad, grad will also sum to the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
369 cost all attached constraints to the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
370 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
371 |