annotate doc/v2_planning/neural_net.txt @ 1244:6d97f32c3fdf

Merged
author Olivier Delalleau <delallea@iro>
date Thu, 23 Sep 2010 12:11:44 -0400
parents 0e12ea6ba661
children
rev   line source
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
1 Neural Net committee
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
2 ====================
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
3
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
4 Members:
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
5 - Razvan Pascanu
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
6 - James Bergstra
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
7 - Xavier Glorot
1099
0b666177f725 added myself to committe
gdesjardins
parents: 1092
diff changeset
8 - Guillaume Desjardins
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
9
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
10 (Add your name here if you want)
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
11
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
12
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
13 Objective ( Razvan)
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1099
diff changeset
14 -------------------
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
15
1090
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
16 Come up with a description of how to write learners ( how to combine
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
17 optimizer, structure, error measure, how to talk to datasets, tasks ( if there
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
18 is anything like a dataset object in your view) and so on).
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
19 o The way I see it personaly, we should pick "random" interfaces for any component
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
20 for which there is no one yet, or change the interface to answer our needs.
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
21 If our description of how these things get together. I would say come up with
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
22 pseudo-code for some tasks ( that vary as much as possible) + text describing
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
23 all the missing details.
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
24
1092
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
25 Link with PLearn
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
26 ----------------
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
27
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
28 OD: This is basically what the OnlineLearningModule framework was doing in
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
29 PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
30 module was a "box" with so-called "ports" representing inputs / outputs. So
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
31 for instance you could think of an RBM as a module with "visible" and
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
32 "hidden" ports, but also "log_p_visible", "energy", etc. You would use
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
33 such a module by calling an fprop method where you would give some values for
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
34 input ports (not necessarily all of them), and would ask some output ports
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
35 (not necessarily all of them). Some ports could be used either as inputs or
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
36 outputs (e.g. the "hidden" port could be used as input to compute
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
37 P(visible|hidden), or as output to compute E[hidden|visible]). Optimization
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
38 was achieved independently within each module, who would be provided a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
39 gradient w.r.t. some of its ports (considered outputs), and asked to update
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
40 its internal parameters and compute accodingly a gradient w.r.t. to its input
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
41 ports.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
42
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
43 Although it worked, it had some issues:
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
44 - The biggest problem was that as you added more ports and options to do
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
45 different computations, the fprop method would grow and grow and become very
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
46 difficult to write properly to handle all possible combinations of inputs /
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
47 outputs, while remaining efficient. Hopefully this is where Theano can be a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
48 big help (note: a "lazy if" could be required to handle situations where the
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
49 same port is computed in very different ways depending on what is given as
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
50 input).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
51 - We had to introduce a notion of 'states' that were ports whose values had to
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
52 be computed, even if they were not asked by the user. The reason was that
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
53 those values were required to perform the optimization (bprop phase) without
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
54 re-doing some computations. Hopefully again Theano could take care of it
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
55 (those states were potentially confusing to the user, who had to manipulate
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
56 them without necessarily understanding what they were for).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
57
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
58 Besides that, there are at least 3 design decisions that could be done
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
59 differently:
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
60 - How to connect those modules together: in those OnlineLearningModules, each
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
61 module had no idea of who it was connected to. A higher level entity was
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
62 responsible for grabbing the output of some module and forwarding it to its
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
63 target destination. This is to be contrasted with the design of PLearn
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
64 Variables, where each variable was explicitely constructed given its input
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
65 variables (Theano-like), and would directly ask them to provide data. I am not
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
66 sure what are the pros vs. cons of these two approaches.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
67 - How to perform optimization. The OnlineLearningModule way is nice to plug
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
68 together pieces that are optimized very differently, because each module is
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
69 responsible for its own optimizatin. However, this also means it is difficult
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
70 to easily try different global optimizers (again, this is in contrast with
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
71 PLearn variables).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
72 - One must think about the issue of RNG for stochastic modules. Here we had
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
73 one single RNG per module. This makes it diffiult to easily try different
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
74 seeds for everyone. On another hand, sharing a single RNG is not neceassarily
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
75 a good idea because of potentially unwanted side-effects.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
76