pylearn: doc/v2_planning/layer

annotate doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b

some typos

author	Razvan Pascanu <r.pascanu@gmail.com>
date	Wed, 22 Sep 2010 20:17:35 -0400
parents	515033d4d3bf
children	32fc5f442dde

rev	line source
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	1 ===============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	2 Layer committee
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	3 ===============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	4
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	5 Members : RP, XG, AB, DWF
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	6
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	7 Proposal (RP)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	8 =============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	9
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	10 You construct your neural network by constructing a graph of connections
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	11 between layers starting from data. While you construct the graph,
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	12 different theano formulas are put together to construct your model.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	13
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	14 Hard details are not set yet, but all members of the committee agreed
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	15 that this sound as a good idea.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	16
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	17
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	18 Example Code (RP):
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	19 ------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	20
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	22
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	23 h1 = sigmoid(dotW_b(train_x, n = 300))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	25
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	26 h2 = sigmoid(dotW_b(h1, n = 300))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	28
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	29 out = sigmoid( dotW_b(h2, n= 10))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	30
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	31 train_err = cross_entropy( out, train_y)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	32
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	33 grads = grad( train_err, err.parameters() )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	34 learner = SGD( err, grads)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	35
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	37 test_err = train_err.replace({ train_x : test_x , train_y : test_y})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	38
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	39
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	40
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	41 Global observations :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	42 ---------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	43
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	44 1) Your graph can have multiple terminal nodes; in this case rbm1,
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	45 rbm2 and learner, valid_err, test_err are all end nodes of the graph;
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	46
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	47 2) Any node is an "iterator", when you would call out.next() you would get
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	48 the next prediction; when you call err.next() you will get next error
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	49 ( on the batch given by the data.next() ).
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	50
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	51 3) Replace can replace any subgraph
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	52
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	53 4) You can have MACROS or SUBROUTINE that already give you the graph for
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	54 known components ( in my view the CDk is such a macro, but simpler
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	55 examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	56
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	57 5) Any node has the entire graph ( though arguably you don't use that
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	58 graph too much). Running such a node in general will be done by compiling
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	59 the Theano expression up to that node( if you don't already have this
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	60 function), and using the data object that you get initially. This theano
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	61 function is compiled only if you need it. You use the graph only to :
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	62 * update the Theano expression in case some part of the subgraph has
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	63 changed (hyper-parameter or a replace call)
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	64 * collect the list of parameters of the model
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	65 * collect the list of hyper-parameters ( my personal view - this
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	66 would mostly be useful for a hyper learner .. and not for day to
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	67 day stuff, but I think is something easy to provide and we should )
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	68 * collect constraints on parameters ( I believe they can be represented
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	69 in the graph as dependency links to other graphs that compute the
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	70 constraints..)
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	71
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	72 6) Registering parameters and hyper-parameters to the graph is the job of
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	73 the transform and therefore of the user who implemented that
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	74 transform; the same for initializing the parameters ( so if we have
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	75 different way to initialize the weight matrix that might be a
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	76 hyperparameter with a default value)
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	77
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	78
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	79
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	80 Detailed Proposal (RP)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	81 ======================
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	82
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	83 I would go through a list of scenarios and possible issues :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	84
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	85 Delayed or feature values
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	86 -------------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	87
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	88 Sometimes you might want future values of some nodes. For example you might
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	89 be interested in :
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	90
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	91 y(t) = x(t) - x(t-1)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	92
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	93 You can get that by having a "delayed" version of a node. A delayed version
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	94 a node x is obtained by calling x.t(k) which will give you a node that has
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	95 the value x(t+k). k can be positive or negative.
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	96 In my view this can be done as follows :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	97 - a node is a class that points to :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	98 * a data object that feeds data
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	99 * a theano expression up to that point
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	100 * the entire graph that describes the model ( not Theano graph !!!)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	101 The only thing you need to do is to change the data object to reflect the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	102 delay ( we might need to be able to pad it with 0?). You need also to create
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	103 a copy of the theano expression ( those are "new nodes" ) in the sense that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	104 the starting theano tensors are different since they point to different data.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	105
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	106
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	107
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	108 Non-theano transformation ( or function or whatever)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	109 ----------------------------------------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	110
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	111 Maybe you want to do something in the middle of your graph that is not Theano
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	112 supported. Let say you have a function f which you can not write in Theano.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	113 You want to do something like
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	114
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	115
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	116 W1f( W2data + b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	117
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	118 I think we can support that by doing the following :
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	119 each node has a:
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	120 * a data object that feeds data
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	121 * a theano expression up to that point
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	122 * the entire graph that describes the model
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	123
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	124 Let x1 = W2*data + b
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	125 up to here everything is fine ( we have a theano expression )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	126 dot(W2, tensor) + b,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	127 where tensor is provided by the data object ( plus a dict of givens
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	128 and whatever else you need to compile the function)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	129
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	130 When you apply f, what you do you create a node that is exactly like the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	131 data object in the sense that it provides a new tensor and a new dict of
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	132 givens
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	133
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	134 so x2 = W1f( W2data+b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	135 will actually point to the expression
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	136 dot(W1, tensor)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	137 and to the data node f(W2*data+b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	138
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	139 what this means is that you basically compile two theano functions t1 and t2
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	140 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	141 break the theano expression and start a new one.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	142
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	143 What you loose :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	144 - there is no optimization or anything between t1,t2 and f ( we don't
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	145 support that)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	146 - if you are running things on GPU, after t1, data will be copied on CPU and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	147 then probably again on GPU - so it doesn't make sense anymore
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	148
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	149
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	150
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	151 Recurrent Things
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	152 ----------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	153
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	154 I think that you can write a recurrent operation by first defining a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	155 graph ( the recrrent relation ):
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	156
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	157 y_tm1 = recurrent_layer(init = zeros(50))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	158 x_t = slice(x, t=0)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	159 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	160
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	161 This would basically give all the information you need to add a scan op
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	162 to your theano expression of the result op, it is just a different way
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	163 of writing things .. which I think is more intuitive.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	164
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	165 You create your primitives which are either a recurrent_layer that should
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	166 have a initial value, or a slice of some other node ( a time slice that is)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	167 Then you call loop giving a expression that starts from those primitives.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	168
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	169 Similarly you can have foldl or map or anything else.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	170
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	171 You would use this instead of writing scan especially if the formula is
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	172 more complicated and you want to automatically collect parameters,
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	173 hyper-parameters and so on.
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	174
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	175 Optimizer
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	176 ---------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	177
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	178 Personally I would respect the findings of the optimization committee,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	179 and have the SGD to require a Node that produces some error ( which can
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	180 be omitted) and the gradients. For this I would also have the grad
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	181 function which would actually only call T.grad.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	182
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	183 If you have non-theano thing in the middle? I don't have any smart
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	184 solution besides ignoring any parameter that it is below the first
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	185 non-theano node and throw a warning.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	186
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	187 Learner
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	188 -------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	189
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	190 In my case I would not have a predict() and eval() method of the learner,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	191 but just a eval(). If you want the predictions you should use the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	192 corresponding node ( before applying the error measure ). This was
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	193 for example out in my first example.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	194
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	195 Of course we could require learners to be special nodes that also have
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	196 a predict output. In that case I'm not sure what the iterating behaiour
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	197 of the node should produce.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	198
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	199 Granularity
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	200 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	201
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	202 Guillaume nicely pointed out that this library might be an overkill.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	203 In the sense that you have a dotW_b transform, and then you will need
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	204 a dotW_b_sparse transform and so on. Plus way of initializing each param
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	205 would result in many more transforms.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	206
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	207 I don't have a perfect answer yet, but my argument will go as this :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	208
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	209 you would have transforms for the most popular option ( dotW_b) for example.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	210 If you need something else you can always decorate a function that takes
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	211 theano arguments and produces theano arguments. More then decoratting you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	212 can have a general apply transform that does something like :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	213
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	214 apply( lambda x,y,z: x*y+z, inputs = x,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	215 hyperparams = [(name,2)],
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	216 params = [(name,theano.shared(..)])
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	217 The order of the arguments in lambda is nodes, params, hyper-params or so.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	218 This would apply the theano expression but it will also register the
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	219 the parameters. It is like creating a transform on the fly.
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	220
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	221 I think you can do such that the result of the apply is
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	222 pickable, but not the apply operation. Meaning that in the graph, the op
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	223 doesn't actually store the lambda expression but a mini theano graph.
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	224
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	225 Also names might be optional, so you can write hyperparam = [2,]
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	226
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	227
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	228 What this way of doing things would buy you hopefully is that you do not
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	229 need to worry about most of your model ( would be just a few macros or
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	230 subrutines).
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	231 you would do something like :
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	232
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	233 rbm1,hidden1 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	234 rbm2,hidden2 = rbm_layer(data,20)
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	235
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	236 and then the part you care about :
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	237
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	238 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params =
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	239 theano.shared(scipy.sparse_CSR(..)))
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	240
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	241 and after that you pottentially still do what you did before :
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	242
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	243 err = cross_entropy(hidden3, target)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	244 grads = grad(err, err.paramters())
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	245 ...
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	246
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	247 I do agree that some of the "transforms" that I have been writing here
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	248 and there are pretty low level, and maybe we don't need them. We might need
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	249 only somewhat higher level transforms. My hope is that for now people think
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	250 of the approach and not about all inner details ( like what transforms we
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	251 need and so on) and see if they are comfortable with it or not.
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	252
1231 5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	253 Do we want to think in this terms? I think is a bit better do have
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	254 a normal python class, hacking it to change something and then either add
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	255 a parameter to init or create a new version. It seems a bit more natural.
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	256
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	257
5ef96142492b some typos Razvan Pascanu <r.pascanu@gmail.com> parents: 1229 diff changeset	258
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	259
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	260 Anyhow Guillaume I'm working on a better answer :)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	261
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	262
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	263 Params and hyperparams
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	264 ----------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	265
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	266 I think it is obvious from what I wrote above that there is a node wrapper
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	267 around the theano expression. I haven't wrote down all the details of that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	268 class. I think there should be such a wrapper around parameters and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	269 hyper-parameters as well. By default those wrappers might not provide
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	270 any informtion. Later on, they can provide for hyper-params for example a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	271 distribution. If when inserting your hyper-param in the graph ( i.e. when
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	272 you call a given transform) you provide the distribution then maybe a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	273 hyperlearner could use it to sample from it.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	274
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	275 For parameters you might define properties like freeze. It can be true or
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	276 false. Whenever it is set to true, the param is not adapted by the optimizer.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	277 Changing this value like changing most of hyper-params implies recompilation
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	278 of the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	279
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	280 I would have a special class of hyper-params which don't require
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	281 recompilation of the graph. Learning rate is an example. This info is also
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	282 given by the wrapper and by how the parameter is used.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	283
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	284 It is up to the user and "transform" implementer to wrap params and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	285 hyper-params correspondingly. But I don't think this is to complicated.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	286 The apply function above has a default behaviour, maybe you would have
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	287 a forth type of argument which is hyper-param that doesn't require
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	288 compilation. We could find a nice name for it.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	289
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	290
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	291 How does this work?
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	292 -------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	293
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	294 You always have a pointer to the entire graph. Whenever a hyper-param
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	295 changes ( or a param freezes) all region of the graph affected get recompiled.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	296 This is by traversing the graph from the bottom node and constructing the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	297 theano expression.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	298
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	299 This function that updates / re-constructs the graph is sligthly more complex
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	300 if you have non-theano functions in the graph ..
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	301
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	302 replace
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	303 -------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	304
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	305 Replace, replaces a part of the graph. The way it works in my view is that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	306 if I write :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	307
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	308 x = x1+x2+x3
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	309 y = x.replace({x2:x5})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	310
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	311 You would first copy the graph that is represented by x ( the params or
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	312 hyper-params are not copied) and then replace the subgraphs. I.e., x will
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	313 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	314 inplace.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	315
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	316 I think these Node classes as something light-weighted, like theano variables.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	317
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	318
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	319 reconstruct
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	320 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	321
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	322 This is something nice for DAA. It is definetely not useful for the rest.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	323 I think though that is a shame having that transformation graph and not
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	324 being able to use it to do this. It will make life so much easier when you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	325 do deep auto-encoders. I wouldn't put it in the core library, but I would
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	326 have in the DAA module. The way I see it you can either have something like
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	327
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	328 # generate your inversable transforms on the fly
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	329 fn = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	330 inv = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	331 my_transform = couple_transforms( forward = fn, inv = inv)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	332
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	333 # have some already widely used such transform in the daa submodule.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	334
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	335
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	336 transforms
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	337 ----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	338
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	339 In my view there will be quite a few of such standard transforms. They
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	340 can be grouped by architecture, basic, sampler, optimizer and so on.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	341
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	342 We do not need to provide all of them, just the ones we need. Researching
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	343 on an architecture would actually lead in creating new such transforms in
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	344 the library.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	345
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	346 There will be definetely a list of basic such transforms in the begining,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	347 like :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	348 replace,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	349 search,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	350 get_param(name)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	351 get_params(..)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	352
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	353 You can have and should have something like a switch ( that based on a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	354 hyper parameter replaces a part of a graph with another or not). This is
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	355 done by re-compiling the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	356
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	357
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	358 Constraints
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	359 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	360
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	361 Nodes also can also keep track of constraints.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	362
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	363 When you write
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	364
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	365 y = add_constraint(x, sum(x**2))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	366
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	367 y is the same node as x, just that it also links to this second graph that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	368 computes constraints. Whenever you call grad, grad will also sum to the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	369 cost all attached constraints to the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	370
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	371

Mercurial > pylearn

annotate doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b