annotate doc/v2_planning/layer_RP.txt @ 1229:515033d4d3bf

a first draft of layer committee
author Razvan Pascanu <r.pascanu@gmail.com>
date Wed, 22 Sep 2010 19:43:24 -0400
parents
children 5ef96142492b
rev   line source
1229
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
1 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
2 Layer committee
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
3 ===============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
4
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
5 Members : RP, XG, AB, DWF
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
6
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
7 Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
8 =============
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
9
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
10 You construct your neural network by constructing a graph of connections
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
11 between layesrs starting from data. While you construct the graph,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
12 different theano formulas are put together to construct your model.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
13
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
14 Hard details are not set yet, but all members of the committee agreed
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
15 that this sound as a good idea.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
16
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
17
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
18 Example Code (RP):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
19 ------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
20
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
22
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
23 h1 = sigmoid(dotW_b(train_x, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
25
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
26 h2 = sigmoid(dotW_b(h1, n = 300))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
28
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
29 out = sigmoid( dotW_b(h2, n= 10))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
30
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
31 train_err = cross_entropy( out, train_y)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
32
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
33 grads = grad( train_err, err.parameters() )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
34 learner = SGD( err, grads)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
35
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
37 test_err = train_err.replace({ train_x : test_x , train_y : test_y})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
38
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
39
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
40
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
41 Global observations :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
42 ---------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
43
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
44 1) Your graph can have multiple terminations; in this case rbm1, rbm2 and learner, valid_err,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
45 test_err are all end nodes of the graph;
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
46
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
47 2) Any node is an "iterator", when you would call out.next() you would get the next prediction;
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
48 when you call err.next() you will get next error ( on the batch given by the data ).
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
49
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
50 3) Replace can replace any subgraph
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
51
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
52 4) You can have MACROS or SUBROUTINE that already give you the graph for known components ( in my
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
53 view the CDk is such a macro, but simpler examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
54
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
55 5) Any node has a pointer at the graph ( though arguably you don't use that graph that much). Running
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
56 such a node in general will be done by compiling the Theano expression up to that node, and using the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
57 data object that you get initially. This theano function is compiled lazy, in the sense that is compiled
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
58 when you try to iterate through the node. You use the graph only to :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
59 * update the Theano expression in case some part of the subgraph has been changed
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
60 * collect the list of parameters of the model
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
61 * collect the list of hyper-parameters ( my personal view - this would mostly be useful for a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
62 hyper learner .. and not day to day basis, but I think is something easy to provide and we should)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
63 * collect constraints on parameters ( I believe they can be inserted in the graph .. things like L1
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
64 and so on )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
65
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
66 6) Registering parameters and hyper-parameters to the graph is the job of the transform and therefore
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
67 to the user who implemented that transform; also initializing the parameters ( so if we have different way
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
68 to initialize the weight matrix that should be a hyperparameter with a default value)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
69
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
70
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
71
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
72 Detailed Proposal (RP)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
73 ======================
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
74
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
75 I would go through a list of scenarios and possible issues :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
76
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
77 Delayed or feature values
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
78 -------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
79
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
80 Sometimes you might want future values of some nodes. For example you might be interested in :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
81
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
82 y(t) = x(t) - x(t-1)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
83
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
84 You can get that by having a "delayed" version of a node. A delayed version a node x is obtained by
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
85 calling x.t(k) which will give you a node that has the value x(t+k). k can be positive or negative.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
86 In my view this can be done as follows :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
87 - a node is a class that points to :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
88 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
89 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
90 * the entire graph that describes the model ( not Theano graph !!!)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
91 The only thing you need to do is to change the data object to reflect the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
92 delay ( we might need to be able to pad it with 0?). You need also to create
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
93 a copy of the theano expression ( those are "new nodes" ) in the sense that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
94 the starting theano tensors are different since they point to different data.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
95
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
96
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
97
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
98 Non-theano transformation ( or function or whatever)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
99 ----------------------------------------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
100
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
101 Maybe you want to do something in the middle of your graph that is not Theano
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
102 supported. Let say you have a function f which you can not write in Theano.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
103 You want to do something like
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
104
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
105
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
106 W1*f( W2*data + b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
107
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
108 I think we can support that by doing the following :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
109 each node has a :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
110 * a data object that feeds data
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
111 * a theano expression up to that point
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
112 * the entire graph that describes the model
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
113
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
114 Let x1 = W2*data + b
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
115 up to here everything is fine ( we have a theano expression )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
116 dot(W2, tensor) + b,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
117 where tensor is provided by the data object ( plus a dict of givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
118 and whatever else you need to compile the function)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
119
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
120 When you apply f, what you do you create a node that is exactly like the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
121 data object in the sense that it provides a new tensor and a new dict of
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
122 givens
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
123
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
124 so x2 = W1*f( W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
125 will actually point to the expression
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
126 dot(W1, tensor)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
127 and to the data node f(W2*data+b)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
128
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
129 what this means is that you basically compile two theano functions t1 and t2
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
130 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
131 break the theano expression and start a new one.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
132
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
133 What you loose :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
134 - there is no optimization or anything between t1,t2 and f ( we don't
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
135 support that)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
136 - if you are running things on GPU, after t1, data will be copied on CPU and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
137 then probably again on GPU - so it doesn't make sense anymore
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
138
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
139
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
140
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
141 Recurrent Things
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
142 ----------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
143
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
144 I think that you can write a recurrent operation by first defining a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
145 graph ( the recrrent relation ):
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
146
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
147 y_tm1 = recurrent_layer(init = zeros(50))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
148 x_t = slice(x, t=0)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
149 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
150
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
151 This would basically give all the information you need to add a scan op
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
152 to your theano expression of the result op, it is just a different way
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
153 of writing things .. which I think is more intuitive.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
154
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
155 You create your primitives which are either a recurrent_layer that should
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
156 have a initial value, or a slice of some other node ( a time slice that is)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
157 Then you call loop giving a expression that starts from those primitives.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
158
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
159 Similarly you can have foldl or map or anything else.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
160
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
161 Optimizer
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
162 ---------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
163
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
164 Personally I would respect the findings of the optimization committee,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
165 and have the SGD to require a Node that produces some error ( which can
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
166 be omitted) and the gradients. For this I would also have the grad
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
167 function which would actually only call T.grad.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
168
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
169 If you have non-theano thing in the middle? I don't have any smart
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
170 solution besides ignoring any parameter that it is below the first
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
171 non-theano node and throw a warning.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
172
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
173 Learner
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
174 -------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
175
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
176 In my case I would not have a predict() and eval() method of the learner,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
177 but just a eval(). If you want the predictions you should use the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
178 corresponding node ( before applying the error measure ). This was
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
179 for example **out** in my first example.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
180
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
181 Of course we could require learners to be special nodes that also have
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
182 a predict output. In that case I'm not sure what the iterator behaiour
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
183 of the node should produce.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
184
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
185 Granularity
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
186 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
187
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
188 Guillaume nicely pointed out that this library might be an overkill.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
189 In the sense that you have a dotW_b transform, and then you will need
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
190 a dotW_b_sparse transform and so on. Plus way of initializing each param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
191 would result in many more transforms.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
192
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
193 I don't have a perfect answer yet, but my argument will go as this :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
194
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
195 you would have transforms for the most popular option ( dotW_b) for example.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
196 If you need something else you can always decorate a function that takes
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
197 theano arguments and produces theano arguments. More then decoratting you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
198 can have a general apply transform that does something like :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
199
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
200 apply( lambda x,y,z: x*y+z, inputs = x,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
201 hyperparams = [(name,2)],
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
202 params = [(name,theano.shared(..)])
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
203 The order of the arguments in lambda is nodes, params, hyper-params or so.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
204 This would apply the theano expression but it will also register the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
205 the parameters. I think you can do such that the result of the apply is
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
206 pickable, but not the apply. Meaning that in the graph, the op doesn't
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
207 actually store the lambda expression but a mini theano graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
208
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
209 Also names might be optional, so you can write hyperparam = [2,]
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
210
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
211
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
212 What this way of doing things would buy you hopefully is that you do not
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
213 need to worry about most of your model ( would be just a few macros or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
214 subrutines).
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
215 you would do like :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
216
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
217 rbm1,hidden1 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
218 rbm2,hidden2 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
219 and then the part you care about :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
220 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params =
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
221 theano.shared(scipy.sparse_CSR(..)))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
222 and after that you pottentially still do what you did before :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
223 err = cross_entropy(hidden3, target)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
224 grads = grad(err, err.paramters())
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
225 ...
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
226
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
227 I do agree that some of the "transforms" that I have been writing here
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
228 and there are pretty low level, and maybe we don't need them. We might need
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
229 only somewhat higher level transforms. My hope is that for now people think
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
230 of the approach and not to all inner details ( like what transforms we need,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
231 and so on) and see if they are comfortable with it or not.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
232
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
233 Do we want to think in this terms? I think is a bit better do have your
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
234 script like that, then hacking into the DBN class to change that W to be
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
235 sparse.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
236
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
237 Anyhow Guillaume I'm working on a better answer :)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
238
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
239
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
240 Params and hyperparams
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
241 ----------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
242
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
243 I think it is obvious from what I wrote above that there is a node wrapper
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
244 around the theano expression. I haven't wrote down all the details of that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
245 class. I think there should be such a wrapper around parameters and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
246 hyper-parameters as well. By default those wrappers might not provide
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
247 any informtion. Later on, they can provide for hyper-params for example a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
248 distribution. If when inserting your hyper-param in the graph ( i.e. when
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
249 you call a given transform) you provide the distribution then maybe a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
250 hyperlearner could use it to sample from it.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
251
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
252 For parameters you might define properties like freeze. It can be true or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
253 false. Whenever it is set to true, the param is not adapted by the optimizer.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
254 Changing this value like changing most of hyper-params implies recompilation
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
255 of the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
256
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
257 I would have a special class of hyper-params which don't require
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
258 recompilation of the graph. Learning rate is an example. This info is also
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
259 given by the wrapper and by how the parameter is used.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
260
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
261 It is up to the user and "transform" implementer to wrap params and
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
262 hyper-params correspondingly. But I don't think this is to complicated.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
263 The apply function above has a default behaviour, maybe you would have
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
264 a forth type of argument which is hyper-param that doesn't require
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
265 compilation. We could find a nice name for it.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
266
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
267
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
268 How does this work?
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
269 -------------------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
270
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
271 You always have a pointer to the entire graph. Whenever a hyper-param
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
272 changes ( or a param freezes) all region of the graph affected get recompiled.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
273 This is by traversing the graph from the bottom node and constructing the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
274 theano expression.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
275
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
276 This function that updates / re-constructs the graph is sligthly more complex
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
277 if you have non-theano functions in the graph ..
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
278
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
279 replace
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
280 -------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
281
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
282 Replace, replaces a part of the graph. The way it works in my view is that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
283 if I write :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
284
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
285 x = x1+x2+x3
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
286 y = x.replace({x2:x5})
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
287
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
288 You would first copy the graph that is represented by x ( the params or
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
289 hyper-params are not copied) and then replace the subgraphs. I.e., x will
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
290 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
291 inplace.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
292
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
293 I think these Node classes as something light-weighted, like theano variables.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
294
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
295
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
296 reconstruct
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
297 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
298
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
299 This is something nice for DAA. It is definetely not useful for the rest.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
300 I think though that is a shame having that transformation graph and not
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
301 being able to use it to do this. It will make life so much easier when you
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
302 do deep auto-encoders. I wouldn't put it in the core library, but I would
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
303 have in the DAA module. The way I see it you can either have something like
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
304
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
305 # generate your inversable transforms on the fly
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
306 fn = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
307 inv = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
308 my_transform = couple_transforms( forward = fn, inv = inv)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
309
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
310 # have some already widely used such transform in the daa submodule.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
311
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
312
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
313 transforms
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
314 ----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
315
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
316 In my view there will be quite a few of such standard transforms. They
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
317 can be grouped by architecture, basic, sampler, optimizer and so on.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
318
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
319 We do not need to provide all of them, just the ones we need. Researching
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
320 on an architecture would actually lead in creating new such transforms in
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
321 the library.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
322
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
323 There will be definetely a list of basic such transforms in the begining,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
324 like :
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
325 replace,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
326 search,
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
327 get_param(name)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
328 get_params(..)
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
329
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
330 You can have and should have something like a switch ( that based on a
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
331 hyper parameter replaces a part of a graph with another or not). This is
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
332 done by re-compiling the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
333
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
334
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
335 Constraints
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
336 -----------
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
337
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
338 Nodes also can also keep track of constraints.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
339
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
340 When you write
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
341
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
342 y = add_constraint(x, sum(x**2))
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
343
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
344 y is the same node as x, just that it also links to this second graph that
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
345 computes constraints. Whenever you call grad, grad will also sum to the
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
346 cost all attached constraints to the graph.
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
347
515033d4d3bf a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
348