Mercurial > pylearn
annotate doc/v2_planning/layer_RP.txt @ 1229:515033d4d3bf
a first draft of layer committee
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Wed, 22 Sep 2010 19:43:24 -0400 |
parents | |
children | 5ef96142492b |
rev | line source |
---|---|
1229
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
1 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
2 Layer committee |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
3 =============== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
4 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
5 Members : RP, XG, AB, DWF |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
6 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
7 Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
8 ============= |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
9 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
10 You construct your neural network by constructing a graph of connections |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
11 between layesrs starting from data. While you construct the graph, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
12 different theano formulas are put together to construct your model. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
13 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
14 Hard details are not set yet, but all members of the committee agreed |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
15 that this sound as a good idea. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
16 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
17 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
18 Example Code (RP): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
19 ------------------ |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
20 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
22 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
23 h1 = sigmoid(dotW_b(train_x, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
25 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
26 h2 = sigmoid(dotW_b(h1, n = 300)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
28 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
29 out = sigmoid( dotW_b(h2, n= 10)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
30 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
31 train_err = cross_entropy( out, train_y) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
32 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
33 grads = grad( train_err, err.parameters() ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
34 learner = SGD( err, grads) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
35 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
37 test_err = train_err.replace({ train_x : test_x , train_y : test_y}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
38 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
39 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
40 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
41 Global observations : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
42 --------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
43 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
44 1) Your graph can have multiple terminations; in this case rbm1, rbm2 and learner, valid_err, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
45 test_err are all end nodes of the graph; |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
46 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
47 2) Any node is an "iterator", when you would call out.next() you would get the next prediction; |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
48 when you call err.next() you will get next error ( on the batch given by the data ). |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
49 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
50 3) Replace can replace any subgraph |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
51 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
52 4) You can have MACROS or SUBROUTINE that already give you the graph for known components ( in my |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
53 view the CDk is such a macro, but simpler examples will be vanilla versions of MLP, DAA, DBN, LOGREG) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
54 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
55 5) Any node has a pointer at the graph ( though arguably you don't use that graph that much). Running |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
56 such a node in general will be done by compiling the Theano expression up to that node, and using the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
57 data object that you get initially. This theano function is compiled lazy, in the sense that is compiled |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
58 when you try to iterate through the node. You use the graph only to : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
59 * update the Theano expression in case some part of the subgraph has been changed |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
60 * collect the list of parameters of the model |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
61 * collect the list of hyper-parameters ( my personal view - this would mostly be useful for a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
62 hyper learner .. and not day to day basis, but I think is something easy to provide and we should) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
63 * collect constraints on parameters ( I believe they can be inserted in the graph .. things like L1 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
64 and so on ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
65 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
66 6) Registering parameters and hyper-parameters to the graph is the job of the transform and therefore |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
67 to the user who implemented that transform; also initializing the parameters ( so if we have different way |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
68 to initialize the weight matrix that should be a hyperparameter with a default value) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
69 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
70 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
71 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
72 Detailed Proposal (RP) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
73 ====================== |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
74 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
75 I would go through a list of scenarios and possible issues : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
76 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
77 Delayed or feature values |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
78 ------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
79 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
80 Sometimes you might want future values of some nodes. For example you might be interested in : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
81 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
82 y(t) = x(t) - x(t-1) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
83 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
84 You can get that by having a "delayed" version of a node. A delayed version a node x is obtained by |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
85 calling x.t(k) which will give you a node that has the value x(t+k). k can be positive or negative. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
86 In my view this can be done as follows : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
87 - a node is a class that points to : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
88 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
89 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
90 * the entire graph that describes the model ( not Theano graph !!!) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
91 The only thing you need to do is to change the data object to reflect the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
92 delay ( we might need to be able to pad it with 0?). You need also to create |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
93 a copy of the theano expression ( those are "new nodes" ) in the sense that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
94 the starting theano tensors are different since they point to different data. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
95 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
96 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
97 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
98 Non-theano transformation ( or function or whatever) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
99 ---------------------------------------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
100 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
101 Maybe you want to do something in the middle of your graph that is not Theano |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
102 supported. Let say you have a function f which you can not write in Theano. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
103 You want to do something like |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
104 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
105 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
106 W1*f( W2*data + b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
107 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
108 I think we can support that by doing the following : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
109 each node has a : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
110 * a data object that feeds data |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
111 * a theano expression up to that point |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
112 * the entire graph that describes the model |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
113 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
114 Let x1 = W2*data + b |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
115 up to here everything is fine ( we have a theano expression ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
116 dot(W2, tensor) + b, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
117 where tensor is provided by the data object ( plus a dict of givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
118 and whatever else you need to compile the function) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
119 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
120 When you apply f, what you do you create a node that is exactly like the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
121 data object in the sense that it provides a new tensor and a new dict of |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
122 givens |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
123 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
124 so x2 = W1*f( W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
125 will actually point to the expression |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
126 dot(W1, tensor) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
127 and to the data node f(W2*data+b) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
128 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
129 what this means is that you basically compile two theano functions t1 and t2 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
130 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
131 break the theano expression and start a new one. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
132 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
133 What you loose : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
134 - there is no optimization or anything between t1,t2 and f ( we don't |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
135 support that) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
136 - if you are running things on GPU, after t1, data will be copied on CPU and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
137 then probably again on GPU - so it doesn't make sense anymore |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
138 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
139 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
140 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
141 Recurrent Things |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
142 ---------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
143 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
144 I think that you can write a recurrent operation by first defining a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
145 graph ( the recrrent relation ): |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
146 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
147 y_tm1 = recurrent_layer(init = zeros(50)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
148 x_t = slice(x, t=0) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
149 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
150 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
151 This would basically give all the information you need to add a scan op |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
152 to your theano expression of the result op, it is just a different way |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
153 of writing things .. which I think is more intuitive. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
154 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
155 You create your primitives which are either a recurrent_layer that should |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
156 have a initial value, or a slice of some other node ( a time slice that is) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
157 Then you call loop giving a expression that starts from those primitives. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
158 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
159 Similarly you can have foldl or map or anything else. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
160 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
161 Optimizer |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
162 --------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
163 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
164 Personally I would respect the findings of the optimization committee, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
165 and have the SGD to require a Node that produces some error ( which can |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
166 be omitted) and the gradients. For this I would also have the grad |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
167 function which would actually only call T.grad. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
168 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
169 If you have non-theano thing in the middle? I don't have any smart |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
170 solution besides ignoring any parameter that it is below the first |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
171 non-theano node and throw a warning. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
172 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
173 Learner |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
174 ------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
175 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
176 In my case I would not have a predict() and eval() method of the learner, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
177 but just a eval(). If you want the predictions you should use the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
178 corresponding node ( before applying the error measure ). This was |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
179 for example **out** in my first example. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
180 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
181 Of course we could require learners to be special nodes that also have |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
182 a predict output. In that case I'm not sure what the iterator behaiour |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
183 of the node should produce. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
184 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
185 Granularity |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
186 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
187 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
188 Guillaume nicely pointed out that this library might be an overkill. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
189 In the sense that you have a dotW_b transform, and then you will need |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
190 a dotW_b_sparse transform and so on. Plus way of initializing each param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
191 would result in many more transforms. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
192 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
193 I don't have a perfect answer yet, but my argument will go as this : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
194 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
195 you would have transforms for the most popular option ( dotW_b) for example. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
196 If you need something else you can always decorate a function that takes |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
197 theano arguments and produces theano arguments. More then decoratting you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
198 can have a general apply transform that does something like : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
199 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
200 apply( lambda x,y,z: x*y+z, inputs = x, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
201 hyperparams = [(name,2)], |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
202 params = [(name,theano.shared(..)]) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
203 The order of the arguments in lambda is nodes, params, hyper-params or so. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
204 This would apply the theano expression but it will also register the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
205 the parameters. I think you can do such that the result of the apply is |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
206 pickable, but not the apply. Meaning that in the graph, the op doesn't |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
207 actually store the lambda expression but a mini theano graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
208 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
209 Also names might be optional, so you can write hyperparam = [2,] |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
210 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
211 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
212 What this way of doing things would buy you hopefully is that you do not |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
213 need to worry about most of your model ( would be just a few macros or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
214 subrutines). |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
215 you would do like : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
216 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
217 rbm1,hidden1 = rbm_layer(data,20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
218 rbm2,hidden2 = rbm_layer(data,20) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
219 and then the part you care about : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
220 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params = |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
221 theano.shared(scipy.sparse_CSR(..))) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
222 and after that you pottentially still do what you did before : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
223 err = cross_entropy(hidden3, target) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
224 grads = grad(err, err.paramters()) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
225 ... |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
226 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
227 I do agree that some of the "transforms" that I have been writing here |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
228 and there are pretty low level, and maybe we don't need them. We might need |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
229 only somewhat higher level transforms. My hope is that for now people think |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
230 of the approach and not to all inner details ( like what transforms we need, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
231 and so on) and see if they are comfortable with it or not. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
232 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
233 Do we want to think in this terms? I think is a bit better do have your |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
234 script like that, then hacking into the DBN class to change that W to be |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
235 sparse. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
236 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
237 Anyhow Guillaume I'm working on a better answer :) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
238 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
239 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
240 Params and hyperparams |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
241 ---------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
242 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
243 I think it is obvious from what I wrote above that there is a node wrapper |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
244 around the theano expression. I haven't wrote down all the details of that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
245 class. I think there should be such a wrapper around parameters and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
246 hyper-parameters as well. By default those wrappers might not provide |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
247 any informtion. Later on, they can provide for hyper-params for example a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
248 distribution. If when inserting your hyper-param in the graph ( i.e. when |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
249 you call a given transform) you provide the distribution then maybe a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
250 hyperlearner could use it to sample from it. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
251 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
252 For parameters you might define properties like freeze. It can be true or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
253 false. Whenever it is set to true, the param is not adapted by the optimizer. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
254 Changing this value like changing most of hyper-params implies recompilation |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
255 of the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
256 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
257 I would have a special class of hyper-params which don't require |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
258 recompilation of the graph. Learning rate is an example. This info is also |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
259 given by the wrapper and by how the parameter is used. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
260 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
261 It is up to the user and "transform" implementer to wrap params and |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
262 hyper-params correspondingly. But I don't think this is to complicated. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
263 The apply function above has a default behaviour, maybe you would have |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
264 a forth type of argument which is hyper-param that doesn't require |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
265 compilation. We could find a nice name for it. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
266 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
267 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
268 How does this work? |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
269 ------------------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
270 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
271 You always have a pointer to the entire graph. Whenever a hyper-param |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
272 changes ( or a param freezes) all region of the graph affected get recompiled. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
273 This is by traversing the graph from the bottom node and constructing the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
274 theano expression. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
275 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
276 This function that updates / re-constructs the graph is sligthly more complex |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
277 if you have non-theano functions in the graph .. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
278 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
279 replace |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
280 ------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
281 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
282 Replace, replaces a part of the graph. The way it works in my view is that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
283 if I write : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
284 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
285 x = x1+x2+x3 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
286 y = x.replace({x2:x5}) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
287 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
288 You would first copy the graph that is represented by x ( the params or |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
289 hyper-params are not copied) and then replace the subgraphs. I.e., x will |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
290 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
291 inplace. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
292 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
293 I think these Node classes as something light-weighted, like theano variables. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
294 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
295 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
296 reconstruct |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
297 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
298 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
299 This is something nice for DAA. It is definetely not useful for the rest. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
300 I think though that is a shame having that transformation graph and not |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
301 being able to use it to do this. It will make life so much easier when you |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
302 do deep auto-encoders. I wouldn't put it in the core library, but I would |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
303 have in the DAA module. The way I see it you can either have something like |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
304 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
305 # generate your inversable transforms on the fly |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
306 fn = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
307 inv = create_transform(lambda : , params, hyper-params ) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
308 my_transform = couple_transforms( forward = fn, inv = inv) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
309 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
310 # have some already widely used such transform in the daa submodule. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
311 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
312 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
313 transforms |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
314 ---------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
315 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
316 In my view there will be quite a few of such standard transforms. They |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
317 can be grouped by architecture, basic, sampler, optimizer and so on. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
318 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
319 We do not need to provide all of them, just the ones we need. Researching |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
320 on an architecture would actually lead in creating new such transforms in |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
321 the library. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
322 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
323 There will be definetely a list of basic such transforms in the begining, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
324 like : |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
325 replace, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
326 search, |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
327 get_param(name) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
328 get_params(..) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
329 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
330 You can have and should have something like a switch ( that based on a |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
331 hyper parameter replaces a part of a graph with another or not). This is |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
332 done by re-compiling the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
333 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
334 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
335 Constraints |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
336 ----------- |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
337 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
338 Nodes also can also keep track of constraints. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
339 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
340 When you write |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
341 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
342 y = add_constraint(x, sum(x**2)) |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
343 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
344 y is the same node as x, just that it also links to this second graph that |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
345 computes constraints. Whenever you call grad, grad will also sum to the |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
346 cost all attached constraints to the graph. |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
347 |
515033d4d3bf
a first draft of layer committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
348 |