pylearn: doc/v2_planning/layer

annotate doc/v2_planning/layer_RP.txt @ 1229:515033d4d3bf

a first draft of layer committee

author	Razvan Pascanu <r.pascanu@gmail.com>
date	Wed, 22 Sep 2010 19:43:24 -0400
parents
children	5ef96142492b

rev	line source
1229 515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	1 ===============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	2 Layer committee
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	3 ===============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	4
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	5 Members : RP, XG, AB, DWF
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	6
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	7 Proposal (RP)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	8 =============
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	9
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	10 You construct your neural network by constructing a graph of connections
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	11 between layesrs starting from data. While you construct the graph,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	12 different theano formulas are put together to construct your model.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	13
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	14 Hard details are not set yet, but all members of the committee agreed
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	15 that this sound as a good idea.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	16
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	17
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	18 Example Code (RP):
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	19 ------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	20
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	21 # Assume you have the dataset as train_x, train_y, valid_x, valid_y, test_x, test_y
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	22
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	23 h1 = sigmoid(dotW_b(train_x, n = 300))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	24 rbm1 = CDk( h1, train_x, k=5, sampler = binomial, cost = pseudolikelihood)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	25
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	26 h2 = sigmoid(dotW_b(h1, n = 300))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	27 rbm2 = CDk( h2, h1, k=5, sampler = binomial, cost= pseudolikelihood)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	28
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	29 out = sigmoid( dotW_b(h2, n= 10))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	30
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	31 train_err = cross_entropy( out, train_y)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	32
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	33 grads = grad( train_err, err.parameters() )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	34 learner = SGD( err, grads)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	35
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	36 valid_err = train_err.replace({ train_x : valid_x, train_y : valid_y})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	37 test_err = train_err.replace({ train_x : test_x , train_y : test_y})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	38
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	39
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	40
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	41 Global observations :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	42 ---------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	43
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	44 1) Your graph can have multiple terminations; in this case rbm1, rbm2 and learner, valid_err,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	45 test_err are all end nodes of the graph;
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	46
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	47 2) Any node is an "iterator", when you would call out.next() you would get the next prediction;
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	48 when you call err.next() you will get next error ( on the batch given by the data ).
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	49
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	50 3) Replace can replace any subgraph
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	51
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	52 4) You can have MACROS or SUBROUTINE that already give you the graph for known components ( in my
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	53 view the CDk is such a macro, but simpler examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	54
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	55 5) Any node has a pointer at the graph ( though arguably you don't use that graph that much). Running
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	56 such a node in general will be done by compiling the Theano expression up to that node, and using the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	57 data object that you get initially. This theano function is compiled lazy, in the sense that is compiled
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	58 when you try to iterate through the node. You use the graph only to :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	59 * update the Theano expression in case some part of the subgraph has been changed
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	60 * collect the list of parameters of the model
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	61 * collect the list of hyper-parameters ( my personal view - this would mostly be useful for a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	62 hyper learner .. and not day to day basis, but I think is something easy to provide and we should)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	63 * collect constraints on parameters ( I believe they can be inserted in the graph .. things like L1
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	64 and so on )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	65
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	66 6) Registering parameters and hyper-parameters to the graph is the job of the transform and therefore
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	67 to the user who implemented that transform; also initializing the parameters ( so if we have different way
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	68 to initialize the weight matrix that should be a hyperparameter with a default value)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	69
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	70
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	71
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	72 Detailed Proposal (RP)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	73 ======================
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	74
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	75 I would go through a list of scenarios and possible issues :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	76
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	77 Delayed or feature values
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	78 -------------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	79
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	80 Sometimes you might want future values of some nodes. For example you might be interested in :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	81
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	82 y(t) = x(t) - x(t-1)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	83
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	84 You can get that by having a "delayed" version of a node. A delayed version a node x is obtained by
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	85 calling x.t(k) which will give you a node that has the value x(t+k). k can be positive or negative.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	86 In my view this can be done as follows :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	87 - a node is a class that points to :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	88 * a data object that feeds data
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	89 * a theano expression up to that point
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	90 * the entire graph that describes the model ( not Theano graph !!!)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	91 The only thing you need to do is to change the data object to reflect the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	92 delay ( we might need to be able to pad it with 0?). You need also to create
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	93 a copy of the theano expression ( those are "new nodes" ) in the sense that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	94 the starting theano tensors are different since they point to different data.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	95
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	96
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	97
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	98 Non-theano transformation ( or function or whatever)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	99 ----------------------------------------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	100
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	101 Maybe you want to do something in the middle of your graph that is not Theano
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	102 supported. Let say you have a function f which you can not write in Theano.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	103 You want to do something like
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	104
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	105
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	106 W1f( W2data + b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	107
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	108 I think we can support that by doing the following :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	109 each node has a :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	110 * a data object that feeds data
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	111 * a theano expression up to that point
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	112 * the entire graph that describes the model
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	113
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	114 Let x1 = W2*data + b
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	115 up to here everything is fine ( we have a theano expression )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	116 dot(W2, tensor) + b,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	117 where tensor is provided by the data object ( plus a dict of givens
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	118 and whatever else you need to compile the function)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	119
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	120 When you apply f, what you do you create a node that is exactly like the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	121 data object in the sense that it provides a new tensor and a new dict of
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	122 givens
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	123
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	124 so x2 = W1f( W2data+b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	125 will actually point to the expression
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	126 dot(W1, tensor)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	127 and to the data node f(W2*data+b)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	128
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	129 what this means is that you basically compile two theano functions t1 and t2
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	130 and evaluate t2(f(t1(data))). So everytime you have a non theano operation you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	131 break the theano expression and start a new one.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	132
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	133 What you loose :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	134 - there is no optimization or anything between t1,t2 and f ( we don't
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	135 support that)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	136 - if you are running things on GPU, after t1, data will be copied on CPU and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	137 then probably again on GPU - so it doesn't make sense anymore
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	138
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	139
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	140
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	141 Recurrent Things
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	142 ----------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	143
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	144 I think that you can write a recurrent operation by first defining a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	145 graph ( the recrrent relation ):
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	146
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	147 y_tm1 = recurrent_layer(init = zeros(50))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	148 x_t = slice(x, t=0)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	149 y = loop( dotW_b(y_tm1,50) + x_t, steps = 20)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	150
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	151 This would basically give all the information you need to add a scan op
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	152 to your theano expression of the result op, it is just a different way
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	153 of writing things .. which I think is more intuitive.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	154
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	155 You create your primitives which are either a recurrent_layer that should
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	156 have a initial value, or a slice of some other node ( a time slice that is)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	157 Then you call loop giving a expression that starts from those primitives.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	158
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	159 Similarly you can have foldl or map or anything else.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	160
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	161 Optimizer
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	162 ---------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	163
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	164 Personally I would respect the findings of the optimization committee,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	165 and have the SGD to require a Node that produces some error ( which can
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	166 be omitted) and the gradients. For this I would also have the grad
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	167 function which would actually only call T.grad.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	168
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	169 If you have non-theano thing in the middle? I don't have any smart
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	170 solution besides ignoring any parameter that it is below the first
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	171 non-theano node and throw a warning.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	172
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	173 Learner
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	174 -------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	175
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	176 In my case I would not have a predict() and eval() method of the learner,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	177 but just a eval(). If you want the predictions you should use the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	178 corresponding node ( before applying the error measure ). This was
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	179 for example out in my first example.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	180
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	181 Of course we could require learners to be special nodes that also have
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	182 a predict output. In that case I'm not sure what the iterator behaiour
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	183 of the node should produce.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	184
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	185 Granularity
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	186 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	187
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	188 Guillaume nicely pointed out that this library might be an overkill.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	189 In the sense that you have a dotW_b transform, and then you will need
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	190 a dotW_b_sparse transform and so on. Plus way of initializing each param
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	191 would result in many more transforms.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	192
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	193 I don't have a perfect answer yet, but my argument will go as this :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	194
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	195 you would have transforms for the most popular option ( dotW_b) for example.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	196 If you need something else you can always decorate a function that takes
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	197 theano arguments and produces theano arguments. More then decoratting you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	198 can have a general apply transform that does something like :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	199
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	200 apply( lambda x,y,z: x*y+z, inputs = x,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	201 hyperparams = [(name,2)],
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	202 params = [(name,theano.shared(..)])
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	203 The order of the arguments in lambda is nodes, params, hyper-params or so.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	204 This would apply the theano expression but it will also register the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	205 the parameters. I think you can do such that the result of the apply is
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	206 pickable, but not the apply. Meaning that in the graph, the op doesn't
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	207 actually store the lambda expression but a mini theano graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	208
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	209 Also names might be optional, so you can write hyperparam = [2,]
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	210
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	211
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	212 What this way of doing things would buy you hopefully is that you do not
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	213 need to worry about most of your model ( would be just a few macros or
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	214 subrutines).
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	215 you would do like :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	216
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	217 rbm1,hidden1 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	218 rbm2,hidden2 = rbm_layer(data,20)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	219 and then the part you care about :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	220 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params =
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	221 theano.shared(scipy.sparse_CSR(..)))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	222 and after that you pottentially still do what you did before :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	223 err = cross_entropy(hidden3, target)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	224 grads = grad(err, err.paramters())
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	225 ...
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	226
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	227 I do agree that some of the "transforms" that I have been writing here
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	228 and there are pretty low level, and maybe we don't need them. We might need
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	229 only somewhat higher level transforms. My hope is that for now people think
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	230 of the approach and not to all inner details ( like what transforms we need,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	231 and so on) and see if they are comfortable with it or not.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	232
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	233 Do we want to think in this terms? I think is a bit better do have your
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	234 script like that, then hacking into the DBN class to change that W to be
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	235 sparse.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	236
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	237 Anyhow Guillaume I'm working on a better answer :)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	238
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	239
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	240 Params and hyperparams
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	241 ----------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	242
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	243 I think it is obvious from what I wrote above that there is a node wrapper
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	244 around the theano expression. I haven't wrote down all the details of that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	245 class. I think there should be such a wrapper around parameters and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	246 hyper-parameters as well. By default those wrappers might not provide
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	247 any informtion. Later on, they can provide for hyper-params for example a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	248 distribution. If when inserting your hyper-param in the graph ( i.e. when
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	249 you call a given transform) you provide the distribution then maybe a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	250 hyperlearner could use it to sample from it.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	251
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	252 For parameters you might define properties like freeze. It can be true or
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	253 false. Whenever it is set to true, the param is not adapted by the optimizer.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	254 Changing this value like changing most of hyper-params implies recompilation
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	255 of the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	256
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	257 I would have a special class of hyper-params which don't require
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	258 recompilation of the graph. Learning rate is an example. This info is also
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	259 given by the wrapper and by how the parameter is used.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	260
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	261 It is up to the user and "transform" implementer to wrap params and
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	262 hyper-params correspondingly. But I don't think this is to complicated.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	263 The apply function above has a default behaviour, maybe you would have
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	264 a forth type of argument which is hyper-param that doesn't require
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	265 compilation. We could find a nice name for it.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	266
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	267
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	268 How does this work?
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	269 -------------------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	270
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	271 You always have a pointer to the entire graph. Whenever a hyper-param
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	272 changes ( or a param freezes) all region of the graph affected get recompiled.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	273 This is by traversing the graph from the bottom node and constructing the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	274 theano expression.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	275
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	276 This function that updates / re-constructs the graph is sligthly more complex
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	277 if you have non-theano functions in the graph ..
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	278
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	279 replace
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	280 -------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	281
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	282 Replace, replaces a part of the graph. The way it works in my view is that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	283 if I write :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	284
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	285 x = x1+x2+x3
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	286 y = x.replace({x2:x5})
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	287
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	288 You would first copy the graph that is represented by x ( the params or
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	289 hyper-params are not copied) and then replace the subgraphs. I.e., x will
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	290 still point to x1+x2+x3, y will point to x1+x5+x3. Replace is not done
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	291 inplace.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	292
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	293 I think these Node classes as something light-weighted, like theano variables.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	294
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	295
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	296 reconstruct
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	297 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	298
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	299 This is something nice for DAA. It is definetely not useful for the rest.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	300 I think though that is a shame having that transformation graph and not
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	301 being able to use it to do this. It will make life so much easier when you
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	302 do deep auto-encoders. I wouldn't put it in the core library, but I would
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	303 have in the DAA module. The way I see it you can either have something like
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	304
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	305 # generate your inversable transforms on the fly
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	306 fn = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	307 inv = create_transform(lambda : , params, hyper-params )
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	308 my_transform = couple_transforms( forward = fn, inv = inv)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	309
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	310 # have some already widely used such transform in the daa submodule.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	311
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	312
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	313 transforms
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	314 ----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	315
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	316 In my view there will be quite a few of such standard transforms. They
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	317 can be grouped by architecture, basic, sampler, optimizer and so on.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	318
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	319 We do not need to provide all of them, just the ones we need. Researching
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	320 on an architecture would actually lead in creating new such transforms in
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	321 the library.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	322
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	323 There will be definetely a list of basic such transforms in the begining,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	324 like :
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	325 replace,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	326 search,
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	327 get_param(name)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	328 get_params(..)
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	329
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	330 You can have and should have something like a switch ( that based on a
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	331 hyper parameter replaces a part of a graph with another or not). This is
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	332 done by re-compiling the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	333
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	334
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	335 Constraints
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	336 -----------
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	337
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	338 Nodes also can also keep track of constraints.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	339
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	340 When you write
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	341
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	342 y = add_constraint(x, sum(x**2))
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	343
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	344 y is the same node as x, just that it also links to this second graph that
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	345 computes constraints. Whenever you call grad, grad will also sum to the
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	346 cost all attached constraints to the graph.
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	347
515033d4d3bf a first draft of layer committee Razvan Pascanu <r.pascanu@gmail.com> parents: diff changeset	348

Mercurial > pylearn

annotate doc/v2_planning/layer_RP.txt @ 1229:515033d4d3bf