annotate doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b

some typos
author Razvan Pascanu <r.pascanu@gmail.com>
date Wed, 22 Sep 2010 20:17:35 -0400
parents 515033d4d3bf
children 32fc5f442dde
rev   line source
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
1 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
2 Layer committee
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
3 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
4
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
5 Members : RP, XG, AB, DWF
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
6
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
7 Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
8 =============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
9
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
10 You construct your neural network by constructing a graph of connections
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
11 between layers starting from data. While you construct the graph,
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
12 different theano formulas are put together to construct your model.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
13
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
14 Hard details are not set yet, but all members of the committee agreed
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
15 that this sound as a good idea.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
16
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
17
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
18 Example Code (RP):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
19 ------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
20
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
22
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
23 h1 = sigmoid(dotW_b(train_x, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
25
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
26 h2 = sigmoid(dotW_b(h1, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
28
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
29 out = sigmoid( dotW_b(h2, n= 10))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
30
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
31 train_err = cross_entropy( out, train_y)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
32
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
33 grads = grad( train_err, err.parameters() )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
34 learner = SGD( err, grads)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
35
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
37 test_err = train_err.replace({ train_x : test_x , train_y : test_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
38
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
39
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
40
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
41 Global observations :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
42 ---------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
43
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
44 1) Your graph can have multiple terminal nodes; in this case rbm1,
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
45 rbm2 and learner, valid_err, test_err are all end nodes of the graph;
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
46
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
47 2) Any node is an "iterator", when you would call out.next() you would get
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
48 the next prediction; when you call err.next() you will get next error
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
49 ( on the batch given by the data.next() ).
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
50
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
51 3) Replace can replace any subgraph
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
52
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
53 4) You can have MACROS or SUBROUTINE that already give you the graph for
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
54 known components ( in my view the CDk is such a macro, but simpler
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
55 examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
56
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
57 5) Any node has the entire graph ( though arguably you don't use that
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
58 graph too much). Running such a node in general will be done by compiling
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
59 the Theano expression up to that node( if you don't already have this
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
60 function), and using the data object that you get initially. This theano
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
61 function is compiled only if you need it. You use the graph only to :
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
62 * update the Theano expression in case some part of the subgraph has
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
63 changed (hyper-parameter or a replace call)
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
64 * collect the list of parameters of the model
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
65 * collect the list of hyper-parameters ( my personal view - this
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
66 would mostly be useful for a hyper learner .. and not for day to
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
67 day stuff, but I think is something easy to provide and we should )
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
68 * collect constraints on parameters ( I believe they can be represented
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
69 in the graph as dependency links to other graphs that compute the
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
70 constraints..)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
71
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
72 6) Registering parameters and hyper-parameters to the graph is the job of
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
73 the transform and therefore of the user who implemented that
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
74 transform; the same for initializing the parameters ( so if we have
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
75 different way to initialize the weight matrix that might be a
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
76 hyperparameter with a default value)
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
77
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
78
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
79
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
80 Detailed Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
81 ======================
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
82
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
83 I would go through a list of scenarios and possible issues :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
84
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
85 Delayed or feature values
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
86 -------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
87
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
88 Sometimes you might want future values of some nodes. For example you might
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
89 be interested in :
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
90
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
91 y(t) = x(t) - x(t-1)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
92
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
93 You can get that by having a "delayed" version of a node. A delayed version
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
94 a node x is obtained by calling x.t(k) which will give you a node that has
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
95 the value x(t+k). k can be positive or negative.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
96 In my view this can be done as follows :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
97 - a node is a class that points to :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
98 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
99 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
100 * the entire graph that describes the model ( not Theano graph !!!)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
101 The only thing you need to do is to change the data object to reflect the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
102 delay ( we might need to be able to pad it with 0?). You need also to create
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
103 a copy of the theano expression ( those are "new nodes" ) in the sense that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
104 the starting theano tensors are different since they point to different data.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
105
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
106
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
107
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
108 Non-theano transformation ( or function or whatever)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
109 ----------------------------------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
110
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
111 Maybe you want to do something in the middle of your graph that is not Theano
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
112 supported. Let say you have a function f which you can not write in Theano.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
113 You want to do something like
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
114
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
115
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
116 W1*f( W2*data + b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
117
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
118 I think we can support that by doing the following :
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
119 each node has a:
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
120 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
121 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
122 * the entire graph that describes the model
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
123
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
124 Let x1 = W2*data + b
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
125 up to here everything is fine ( we have a theano expression )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
126 dot(W2, tensor) + b,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
127 where tensor is provided by the data object ( plus a dict of givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
128 and whatever else you need to compile the function)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
129
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
130 When you apply f, what you do you create a node that is exactly like the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
131 data object in the sense that it provides a new tensor and a new dict of
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
132 givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
133
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
134 so x2 = W1*f( W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
135 will actually point to the expression
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
136 dot(W1, tensor)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
137 and to the data node f(W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
138
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
139 what this means is that you basically compile two theano functions t1 and t2
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
140 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
141 break the theano expression and start a new one.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
142
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
143 What you loose :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
144 - there is no optimization or anything between t1,t2 and f ( we don't
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
145 support that)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
146 - if you are running things on GPU, after t1, data will be copied on CPU and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
147 then probably again on GPU - so it doesn't make sense anymore
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
148
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
149
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
150
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
151 Recurrent Things
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
152 ----------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
153
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
154 I think that you can write a recurrent operation by first defining a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
155 graph ( the recrrent relation ):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
156
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
157 y_tm1 = recurrent_layer(init = zeros(50))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
158 x_t = slice(x, t=0)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
159 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
160
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
161 This would basically give all the information you need to add a scan op
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
162 to your theano expression of the result op, it is just a different way
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
163 of writing things .. which I think is more intuitive.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
164
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
165 You create your primitives which are either a recurrent_layer that should
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
166 have a initial value, or a slice of some other node ( a time slice that is)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
167 Then you call loop giving a expression that starts from those primitives.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
168
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
169 Similarly you can have foldl or map or anything else.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
170
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
171 You would use this instead of writing scan especially if the formula is
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
172 more complicated and you want to automatically collect parameters,
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
173 hyper-parameters and so on.
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
174
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
175 Optimizer
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
176 ---------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
177
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
178 Personally I would respect the findings of the optimization committee,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
179 and have the SGD to require a Node that produces some error ( which can
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
180 be omitted) and the gradients. For this I would also have the grad
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
181 function which would actually only call T.grad.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
182
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
183 If you have non-theano thing in the middle? I don't have any smart
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
184 solution besides ignoring any parameter that it is below the first
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
185 non-theano node and throw a warning.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
186
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
187 Learner
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
188 -------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
189
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
190 In my case I would not have a predict() and eval() method of the learner,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
191 but just a eval(). If you want the predictions you should use the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
192 corresponding node ( before applying the error measure ). This was
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
193 for example **out** in my first example.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
194
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
195 Of course we could require learners to be special nodes that also have
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
196 a predict output. In that case I'm not sure what the iterating behaiour
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
197 of the node should produce.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
198
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
199 Granularity
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
200 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
201
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
202 Guillaume nicely pointed out that this library might be an overkill.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
203 In the sense that you have a dotW_b transform, and then you will need
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
204 a dotW_b_sparse transform and so on. Plus way of initializing each param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
205 would result in many more transforms.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
206
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
207 I don't have a perfect answer yet, but my argument will go as this :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
208
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
209 you would have transforms for the most popular option ( dotW_b) for example.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
210 If you need something else you can always decorate a function that takes
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
211 theano arguments and produces theano arguments. More then decoratting you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
212 can have a general apply transform that does something like :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
213
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
214 apply( lambda x,y,z: x*y+z, inputs = x,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
215 hyperparams = [(name,2)],
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
216 params = [(name,theano.shared(..)])
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
217 The order of the arguments in lambda is nodes, params, hyper-params or so.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
218 This would apply the theano expression but it will also register the
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
219 the parameters. It is like creating a transform on the fly.
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
220
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
221 I think you can do such that the result of the apply is
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
222 pickable, but not the apply operation. Meaning that in the graph, the op
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
223 doesn't actually store the lambda expression but a mini theano graph.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
224
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
225 Also names might be optional, so you can write hyperparam = [2,]
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
226
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
227
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
228 What this way of doing things would buy you hopefully is that you do not
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
229 need to worry about most of your model ( would be just a few macros or
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
230 subrutines).
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
231 you would do something like :
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
232
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
233 rbm1,hidden1 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
234 rbm2,hidden2 = rbm_layer(data,20)
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
235
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
236 and then the part you care about :
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
237
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
238 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params =
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
239 theano.shared(scipy.sparse_CSR(..)))
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
240
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
241 and after that you pottentially still do what you did before :
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
242
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
243 err = cross_entropy(hidden3, target)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
244 grads = grad(err, err.paramters())
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
245 ...
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
246
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
247 I do agree that some of the "transforms" that I have been writing here
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
248 and there are pretty low level, and maybe we don't need them. We might need
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
249 only somewhat higher level transforms. My hope is that for now people think
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
250 of the approach and not about all inner details ( like what transforms we
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
251 need and so on) and see if they are comfortable with it or not.
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
252
1231
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
253 Do we want to think in this terms? I think is a bit better do have
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
254 a normal python class, hacking it to change something and then either add
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
255 a parameter to init or create a new version. It seems a bit more natural.
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
256
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
257
5ef96142492b some typos
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1229
diff changeset
258
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
259
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
260 Anyhow Guillaume I'm working on a better answer :)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
261
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
262
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
263 Params and hyperparams
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
264 ----------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
265
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
266 I think it is obvious from what I wrote above that there is a node wrapper
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
267 around the theano expression. I haven't wrote down all the details of that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
268 class. I think there should be such a wrapper around parameters and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
269 hyper-parameters as well. By default those wrappers might not provide
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
270 any informtion. Later on, they can provide for hyper-params for example a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
271 distribution. If when inserting your hyper-param in the graph ( i.e. when
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
272 you call a given transform) you provide the distribution then maybe a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
273 hyperlearner could use it to sample from it.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
274
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
275 For parameters you might define properties like freeze. It can be true or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
276 false. Whenever it is set to true, the param is not adapted by the optimizer.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
277 Changing this value like changing most of hyper-params implies recompilation
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
278 of the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
279
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
280 I would have a special class of hyper-params which don't require
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
281 recompilation of the graph. Learning rate is an example. This info is also
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
282 given by the wrapper and by how the parameter is used.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
283
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
284 It is up to the user and "transform" implementer to wrap params and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
285 hyper-params correspondingly. But I don't think this is to complicated.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
286 The apply function above has a default behaviour, maybe you would have
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
287 a forth type of argument which is hyper-param that doesn't require
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
288 compilation. We could find a nice name for it.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
289
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
290
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
291 How does this work?
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
292 -------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
293
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
294 You always have a pointer to the entire graph. Whenever a hyper-param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
295 changes ( or a param freezes) all region of the graph affected get recompiled.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
296 This is by traversing the graph from the bottom node and constructing the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
297 theano expression.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
298
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
299 This function that updates / re-constructs the graph is sligthly more complex
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
300 if you have non-theano functions in the graph ..
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
301
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
302 replace
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
303 -------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
304
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
305 Replace, replaces a part of the graph. The way it works in my view is that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
306 if I write :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
307
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
308 x = x1+x2+x3
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
309 y = x.replace({x2:x5})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
310
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
311 You would first copy the graph that is represented by x ( the params or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
312 hyper-params are not copied) and then replace the subgraphs. I.e., x will
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
313 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
314 inplace.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
315
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
316 I think these Node classes as something light-weighted, like theano variables.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
317
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
318
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
319 reconstruct
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
320 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
321
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
322 This is something nice for DAA. It is definetely not useful for the rest.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
323 I think though that is a shame having that transformation graph and not
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
324 being able to use it to do this. It will make life so much easier when you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
325 do deep auto-encoders. I wouldn't put it in the core library, but I would
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
326 have in the DAA module. The way I see it you can either have something like
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
327
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
328 # generate your inversable transforms on the fly
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
329 fn = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
330 inv = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
331 my_transform = couple_transforms( forward = fn, inv = inv)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
332
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
333 # have some already widely used such transform in the daa submodule.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
334
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
335
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
336 transforms
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
337 ----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
338
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
339 In my view there will be quite a few of such standard transforms. They
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
340 can be grouped by architecture, basic, sampler, optimizer and so on.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
341
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
342 We do not need to provide all of them, just the ones we need. Researching
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
343 on an architecture would actually lead in creating new such transforms in
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
344 the library.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
345
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
346 There will be definetely a list of basic such transforms in the begining,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
347 like :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
348 replace,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
349 search,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
350 get_param(name)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
351 get_params(..)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
352
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
353 You can have and should have something like a switch ( that based on a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
354 hyper parameter replaces a part of a graph with another or not). This is
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
355 done by re-compiling the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
356
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
357
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
358 Constraints
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
359 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
360
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
361 Nodes also can also keep track of constraints.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
362
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
363 When you write
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
364
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
365 y = add_constraint(x, sum(x**2))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
366
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
367 y is the same node as x, just that it also links to this second graph that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
368 computes constraints. Whenever you call grad, grad will also sum to the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
369 cost all attached constraints to the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
370
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
371