# HG changeset patch # User Olivier Delalleau # Date 1284318739 14400 # Node ID aab9c261361c42ee791342d1dfb6b124ca16e975 # Parent 319de699fb67754e829037d00f1622c39803affb neural_net: Added info about how PLearn was doing it diff -r 319de699fb67 -r aab9c261361c doc/v2_planning/neural_net.txt --- a/doc/v2_planning/neural_net.txt Sun Sep 12 14:14:23 2010 -0400 +++ b/doc/v2_planning/neural_net.txt Sun Sep 12 15:12:19 2010 -0400 @@ -21,3 +21,55 @@ pseudo-code for some tasks ( that vary as much as possible) + text describing all the missing details. +Link with PLearn +---------------- + +OD: This is basically what the OnlineLearningModule framework was doing in +PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a +module was a "box" with so-called "ports" representing inputs / outputs. So +for instance you could think of an RBM as a module with "visible" and +"hidden" ports, but also "log_p_visible", "energy", etc. You would use +such a module by calling an fprop method where you would give some values for +input ports (not necessarily all of them), and would ask some output ports +(not necessarily all of them). Some ports could be used either as inputs or +outputs (e.g. the "hidden" port could be used as input to compute +P(visible|hidden), or as output to compute E[hidden|visible]). Optimization +was achieved independently within each module, who would be provided a +gradient w.r.t. some of its ports (considered outputs), and asked to update +its internal parameters and compute accodingly a gradient w.r.t. to its input +ports. + +Although it worked, it had some issues: +- The biggest problem was that as you added more ports and options to do +different computations, the fprop method would grow and grow and become very +difficult to write properly to handle all possible combinations of inputs / +outputs, while remaining efficient. Hopefully this is where Theano can be a +big help (note: a "lazy if" could be required to handle situations where the +same port is computed in very different ways depending on what is given as +input). +- We had to introduce a notion of 'states' that were ports whose values had to +be computed, even if they were not asked by the user. The reason was that +those values were required to perform the optimization (bprop phase) without +re-doing some computations. Hopefully again Theano could take care of it +(those states were potentially confusing to the user, who had to manipulate +them without necessarily understanding what they were for). + +Besides that, there are at least 3 design decisions that could be done +differently: +- How to connect those modules together: in those OnlineLearningModules, each +module had no idea of who it was connected to. A higher level entity was +responsible for grabbing the output of some module and forwarding it to its +target destination. This is to be contrasted with the design of PLearn +Variables, where each variable was explicitely constructed given its input +variables (Theano-like), and would directly ask them to provide data. I am not +sure what are the pros vs. cons of these two approaches. +- How to perform optimization. The OnlineLearningModule way is nice to plug +together pieces that are optimized very differently, because each module is +responsible for its own optimizatin. However, this also means it is difficult +to easily try different global optimizers (again, this is in contrast with +PLearn variables). +- One must think about the issue of RNG for stochastic modules. Here we had +one single RNG per module. This makes it diffiult to easily try different +seeds for everyone. On another hand, sharing a single RNG is not neceassarily +a good idea because of potentially unwanted side-effects. +