view doc/v2_planning/neural_net.txt @ 1183:bc1b445e22fa

API_coding_style: Added code example to explain the point about the number of spaces after a period
author Olivier Delalleau <delallea@iro>
date Fri, 17 Sep 2010 16:51:09 -0400
parents 0b666177f725
children 0e12ea6ba661
line wrap: on
line source

Neural Net committee
====================

Members:
 - Razvan Pascanu
 - James Bergstra
 - Xavier Glorot
 - Guillaume Desjardins

(Add your name here if you want)


Objective ( Razvan)
---------

Come up with a description of how to write learners ( how to combine
optimizer, structure, error measure, how to talk to datasets, tasks ( if there
is anything like a dataset object in your view) and so on).
o The way I see it personaly, we should pick "random" interfaces for any component 
for which there is no one yet, or change the interface to answer our needs. 
If our description of how these things get together. I would say come up with 
pseudo-code for some tasks ( that vary as much as possible) + text describing
all the missing details.

Link with PLearn
----------------

OD: This is basically what the OnlineLearningModule framework was doing in
PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a
module was a "box" with so-called "ports" representing inputs / outputs. So
for instance you could think of an RBM as a module with "visible" and
"hidden" ports, but also "log_p_visible", "energy", etc. You would use
such a module by calling an fprop method where you would give some values for
input ports (not necessarily all of them), and would ask some output ports
(not necessarily all of them). Some ports could be used either as inputs or
outputs (e.g. the "hidden" port could be used as input to compute
P(visible|hidden), or as output to compute E[hidden|visible]). Optimization
was achieved independently within each module, who would be provided a
gradient w.r.t. some of its ports (considered outputs), and asked to update
its internal parameters and compute accodingly a gradient w.r.t. to its input
ports.

Although it worked, it had some issues:
- The biggest problem was that as you added more ports and options to do
different computations, the fprop method would grow and grow and become very
difficult to write properly to handle all possible combinations of inputs /
outputs, while remaining efficient. Hopefully this is where Theano can be a
big help (note: a "lazy if" could be required to handle situations where the
same port is computed in very different ways depending on what is given as
input).
- We had to introduce a notion of 'states' that were ports whose values had to
be computed, even if they were not asked by the user. The reason was that
those values were required to perform the optimization (bprop phase) without
re-doing some computations. Hopefully again Theano could take care of it
(those states were potentially confusing to the user, who had to manipulate
them without necessarily understanding what they were for).

Besides that, there are at least 3 design decisions that could be done
differently:
- How to connect those modules together: in those OnlineLearningModules, each
module had no idea of who it was connected to. A higher level entity was
responsible for grabbing the output of some module and forwarding it to its
target destination. This is to be contrasted with the design of PLearn
Variables, where each variable was explicitely constructed given its input
variables (Theano-like), and would directly ask them to provide data. I am not
sure what are the pros vs. cons of these two approaches.
- How to perform optimization. The OnlineLearningModule way is nice to plug
together pieces that are optimized very differently, because each module is
responsible for its own optimizatin. However, this also means it is difficult
to easily try different global optimizers (again, this is in contrast with
PLearn variables).
- One must think about the issue of RNG for stochastic modules. Here we had
one single RNG per module. This makes it diffiult to easily try different
seeds for everyone. On another hand, sharing a single RNG is not neceassarily
a good idea because of potentially unwanted side-effects.