Mercurial > pylearn
view doc/v2_planning/neural_net.txt @ 1436:35b56d794d09
added option to order descending /ascending the fields acording to their size
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Tue, 22 Feb 2011 11:40:02 -0500 |
parents | 0e12ea6ba661 |
children |
line wrap: on
line source
Neural Net committee ==================== Members: - Razvan Pascanu - James Bergstra - Xavier Glorot - Guillaume Desjardins (Add your name here if you want) Objective ( Razvan) ------------------- Come up with a description of how to write learners ( how to combine optimizer, structure, error measure, how to talk to datasets, tasks ( if there is anything like a dataset object in your view) and so on). o The way I see it personaly, we should pick "random" interfaces for any component for which there is no one yet, or change the interface to answer our needs. If our description of how these things get together. I would say come up with pseudo-code for some tasks ( that vary as much as possible) + text describing all the missing details. Link with PLearn ---------------- OD: This is basically what the OnlineLearningModule framework was doing in PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a module was a "box" with so-called "ports" representing inputs / outputs. So for instance you could think of an RBM as a module with "visible" and "hidden" ports, but also "log_p_visible", "energy", etc. You would use such a module by calling an fprop method where you would give some values for input ports (not necessarily all of them), and would ask some output ports (not necessarily all of them). Some ports could be used either as inputs or outputs (e.g. the "hidden" port could be used as input to compute P(visible|hidden), or as output to compute E[hidden|visible]). Optimization was achieved independently within each module, who would be provided a gradient w.r.t. some of its ports (considered outputs), and asked to update its internal parameters and compute accodingly a gradient w.r.t. to its input ports. Although it worked, it had some issues: - The biggest problem was that as you added more ports and options to do different computations, the fprop method would grow and grow and become very difficult to write properly to handle all possible combinations of inputs / outputs, while remaining efficient. Hopefully this is where Theano can be a big help (note: a "lazy if" could be required to handle situations where the same port is computed in very different ways depending on what is given as input). - We had to introduce a notion of 'states' that were ports whose values had to be computed, even if they were not asked by the user. The reason was that those values were required to perform the optimization (bprop phase) without re-doing some computations. Hopefully again Theano could take care of it (those states were potentially confusing to the user, who had to manipulate them without necessarily understanding what they were for). Besides that, there are at least 3 design decisions that could be done differently: - How to connect those modules together: in those OnlineLearningModules, each module had no idea of who it was connected to. A higher level entity was responsible for grabbing the output of some module and forwarding it to its target destination. This is to be contrasted with the design of PLearn Variables, where each variable was explicitely constructed given its input variables (Theano-like), and would directly ask them to provide data. I am not sure what are the pros vs. cons of these two approaches. - How to perform optimization. The OnlineLearningModule way is nice to plug together pieces that are optimized very differently, because each module is responsible for its own optimizatin. However, this also means it is difficult to easily try different global optimizers (again, this is in contrast with PLearn variables). - One must think about the issue of RNG for stochastic modules. Here we had one single RNG per module. This makes it diffiult to easily try different seeds for everyone. On another hand, sharing a single RNG is not neceassarily a good idea because of potentially unwanted side-effects.