pylearn: doc/v2_planning/layer

comparison doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b

some typos

author	Razvan Pascanu <r.pascanu@gmail.com>
date	Wed, 22 Sep 2010 20:17:35 -0400
parents	515033d4d3bf
children	32fc5f442dde

comparison

equal deleted inserted replaced

-:31b72defb680
+:5ef96142492b
 Proposal (RP)
 =============
 You construct your neural network by constructing a graph of connections
-between layesrs starting from data. While you construct the graph,
+between layers starting from data. While you construct the graph,
 different theano formulas are put together to construct your model.
 Hard details are not set yet, but all members of the committee agreed
 that this sound as a good idea.
 Global observations :
 ---------------------
-1) Your graph can have multiple terminations; in this case rbm1, rbm2 and learner, valid_err,
+1) Your graph can have multiple terminal nodes; in this case rbm1,
-test_err  are all end nodes of the graph;
+rbm2 and learner, valid_err, test_err are all end nodes of the graph;
-2) Any node is an "iterator", when you would call out.next() you would get the next prediction;
+2) Any node is an "iterator", when you would call out.next() you would get
-when you call err.next() you will get next error ( on the batch given by the data ).
+the next prediction;  when you call err.next() you will get next error
+( on the batch given by the data.next() ).
 3) Replace can replace any subgraph
-4) You can have MACROS or SUBROUTINE that already give you the graph for known components ( in my
+4) You can have MACROS or SUBROUTINE that already give you the graph for
-view the CDk is such a macro, but simpler examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
+known components ( in my  view the CDk is such a macro, but simpler
+examples will be vanilla versions of MLP, DAA, DBN, LOGREG)
-5) Any node has a pointer at the graph ( though arguably you don't use that graph that much). Running
-such a node in general will be done by compiling the Theano expression up to that node, and using the
+5) Any node has the entire graph ( though arguably you don't use that
-data object that you get initially. This theano function is compiled lazy, in the sense that is compiled
+graph too much). Running such a node in general will be done by compiling
-when you try to iterate through the node. You use the graph only to :
+the Theano expression up to that node( if you don't already have this
-* update the Theano expression in case some part of the subgraph has been changed
+function), and using the data object that you get initially. This theano
-* collect the list of parameters of the model
+function is compiled only if you need it. You use the graph only to :
-* collect the list of hyper-parameters ( my personal view - this would mostly be useful for a
+* update the Theano expression in case some part of the subgraph has
-hyper learner .. and not day to day basis, but I think is something easy to provide and we should)
+changed (hyper-parameter or a replace call)
-* collect constraints on parameters ( I believe they can be inserted in the graph .. things like L1
+* collect the list of parameters of the model
-and so on )
+* collect the list of hyper-parameters ( my personal view - this
+would mostly be useful for a hyper learner .. and not for day to
-6) Registering parameters and hyper-parameters to the graph is the job of the transform and therefore
+day stuff, but I think is something easy to provide and we should )
-to the user who implemented that transform; also initializing the parameters ( so if we have different way
+* collect constraints on parameters ( I believe they can be represented
-to initialize the weight matrix that should be a hyperparameter with a default value)
+in the graph as dependency links to other graphs that compute the
+constraints..)
+6) Registering parameters and hyper-parameters to the graph is the job of
+the transform and therefore of the user who implemented that
+transform; the same for initializing the parameters ( so if we have
+different way to initialize the weight matrix that might be a
+hyperparameter with a default value)
 Detailed Proposal (RP)
 ======================
 I would go through a list of scenarios and possible issues :
 Delayed or feature values
 -------------------------
-Sometimes you might want future values of some nodes.  For example you might be interested in :
+Sometimes you might want future values of some nodes.  For example you might
+be interested in :
 y(t) = x(t) - x(t-1)
-You can get that by having a "delayed" version of a node. A delayed version a node x is obtained by
+You can get that by having a "delayed" version of a node. A delayed version
-calling x.t(k) which will give you a node that has the value x(t+k). k can be positive or negative.
+a node x is obtained by calling x.t(k) which will give you a node that has
+the value x(t+k). k can be positive or negative.
 In my view this can be done as follows :
 - a node is a class that points to :
 * a data object that feeds data
 * a theano expression up to that point
 * the entire graph that describes the model ( not Theano graph !!!)
 W1*f( W2*data + b)
 I think we can support that by doing the following :
-each node has a :
+each node has a:
 * a data object that feeds data
 * a theano expression up to that point
 * the entire graph that describes the model
 Let x1 = W2*data + b
 have a initial value, or a slice of some other node ( a time slice that is)
 Then you call loop giving a expression that starts from those primitives.
 Similarly you can have foldl or map or anything else.
+You would use this instead of writing scan especially if the formula is
+more complicated and you want to automatically collect parameters,
+hyper-parameters and so on.
 Optimizer
 ---------
 Personally I would respect the findings of the optimization committee,
 and have the SGD to require a Node that produces some error ( which can
 but just a eval(). If you want the predictions you should use the
 corresponding node ( before applying the error measure ). This was
 for example **out** in my first example.
 Of course we could require learners to be special nodes that also have
-a predict output. In that case I'm not sure what the iterator behaiour
+a predict output. In that case I'm not sure what the iterating behaiour
 of the node should produce.
 Granularity
 -----------
 apply( lambda x,y,z: x*y+z, inputs = x,
 hyperparams = [(name,2)],
 params = [(name,theano.shared(..)])
 The order of the arguments in lambda is nodes, params, hyper-params or so.
 This would apply the theano expression but it will also register the
-the parameters. I think you can do such that the result of the apply is
+the parameters. It is like creating a transform on the fly.
-pickable, but not the apply. Meaning that in the graph, the op doesn't
-actually store the lambda expression but a mini theano graph.
+I think you can do such that the result of the apply is
+pickable, but not the apply operation. Meaning that in the graph, the op
+doesn't actually store the lambda expression but a mini theano graph.
 Also names might be optional, so you can write hyperparam = [2,]
 What this way of doing things would buy you hopefully is that you do not
 need to worry about most of your model ( would be just a few macros or
 subrutines).
-you would do like :
+you would do something like :
 rbm1,hidden1 = rbm_layer(data,20)
 rbm2,hidden2 = rbm_layer(data,20)
 and then the part you care about :
 hidden3 = apply( lambda x,W: T.dot(x,W), inputs = hidden2, params =
 theano.shared(scipy.sparse_CSR(..)))
 and after that you pottentially still do what you did before :
 err = cross_entropy(hidden3, target)
 grads = grad(err, err.paramters())
 ...
 I do agree that some of the "transforms" that I have been writing here
 and there are pretty low level, and maybe we don't need them. We might need
 only somewhat higher level transforms. My hope is that for now people think
-of the approach and not to all inner details ( like what transforms we need,
+of the approach and not about all inner details ( like what transforms we
-and so on) and see if they are comfortable with it or not.
+need and so on) and see if they are comfortable with it or not.
-Do we want to think in this terms? I think is a bit better do have your
+Do we want to think in this terms? I think is a bit better do have
-script like that, then hacking into the DBN class to change that W to be
+a normal python class, hacking it to change something and then either add
-sparse.
+a parameter to init or create a new version. It seems a bit more natural.
 Anyhow Guillaume I'm working on a better answer :)
 Params and hyperparams

Mercurial > pylearn

comparison doc/v2_planning/layer_RP.txt @ 1231:5ef96142492b