annotate doc/v2_planning/neural_net.txt @ 1092:aab9c261361c

neural_net: Added info about how PLearn was doing it
author Olivier Delalleau <delallea@iro>
date Sun, 12 Sep 2010 15:12:19 -0400
parents a80b296eb0df
children 0b666177f725
rev   line source
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
1 Neural Net committee
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
2 ====================
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
3
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
4 Members:
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
5 - Razvan Pascanu
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
6 - James Bergstra
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
7 - Xavier Glorot
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
8
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
9 (Add your name here if you want)
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
10
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
11
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
12 Objective ( Razvan)
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
13 ---------
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
14
1090
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
15 Come up with a description of how to write learners ( how to combine
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
16 optimizer, structure, error measure, how to talk to datasets, tasks ( if there
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
17 is anything like a dataset object in your view) and so on).
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
18 o The way I see it personaly, we should pick "random" interfaces for any component
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
19 for which there is no one yet, or change the interface to answer our needs.
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
20 If our description of how these things get together. I would say come up with
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
21 pseudo-code for some tasks ( that vary as much as possible) + text describing
a80b296eb0df I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents: 1088
diff changeset
22 all the missing details.
1088
e254065e7fd7 File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff changeset
23
1092
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
24 Link with PLearn
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
25 ----------------
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
26
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
27 OD: This is basically what the OnlineLearningModule framework was doing in
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
28 PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
29 module was a "box" with so-called "ports" representing inputs / outputs. So
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
30 for instance you could think of an RBM as a module with "visible" and
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
31 "hidden" ports, but also "log_p_visible", "energy", etc. You would use
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
32 such a module by calling an fprop method where you would give some values for
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
33 input ports (not necessarily all of them), and would ask some output ports
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
34 (not necessarily all of them). Some ports could be used either as inputs or
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
35 outputs (e.g. the "hidden" port could be used as input to compute
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
36 P(visible|hidden), or as output to compute E[hidden|visible]). Optimization
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
37 was achieved independently within each module, who would be provided a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
38 gradient w.r.t. some of its ports (considered outputs), and asked to update
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
39 its internal parameters and compute accodingly a gradient w.r.t. to its input
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
40 ports.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
41
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
42 Although it worked, it had some issues:
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
43 - The biggest problem was that as you added more ports and options to do
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
44 different computations, the fprop method would grow and grow and become very
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
45 difficult to write properly to handle all possible combinations of inputs /
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
46 outputs, while remaining efficient. Hopefully this is where Theano can be a
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
47 big help (note: a "lazy if" could be required to handle situations where the
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
48 same port is computed in very different ways depending on what is given as
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
49 input).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
50 - We had to introduce a notion of 'states' that were ports whose values had to
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
51 be computed, even if they were not asked by the user. The reason was that
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
52 those values were required to perform the optimization (bprop phase) without
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
53 re-doing some computations. Hopefully again Theano could take care of it
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
54 (those states were potentially confusing to the user, who had to manipulate
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
55 them without necessarily understanding what they were for).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
56
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
57 Besides that, there are at least 3 design decisions that could be done
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
58 differently:
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
59 - How to connect those modules together: in those OnlineLearningModules, each
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
60 module had no idea of who it was connected to. A higher level entity was
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
61 responsible for grabbing the output of some module and forwarding it to its
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
62 target destination. This is to be contrasted with the design of PLearn
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
63 Variables, where each variable was explicitely constructed given its input
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
64 variables (Theano-like), and would directly ask them to provide data. I am not
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
65 sure what are the pros vs. cons of these two approaches.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
66 - How to perform optimization. The OnlineLearningModule way is nice to plug
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
67 together pieces that are optimized very differently, because each module is
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
68 responsible for its own optimizatin. However, this also means it is difficult
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
69 to easily try different global optimizers (again, this is in contrast with
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
70 PLearn variables).
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
71 - One must think about the issue of RNG for stochastic modules. Here we had
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
72 one single RNG per module. This makes it diffiult to easily try different
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
73 seeds for everyone. On another hand, sharing a single RNG is not neceassarily
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
74 a good idea because of potentially unwanted side-effects.
aab9c261361c neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents: 1090
diff changeset
75