Mercurial > pylearn
annotate doc/v2_planning/neural_net.txt @ 1388:0ff6c613cdf0
Many small fixes in code review proposal
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Tue, 14 Dec 2010 14:53:48 -0500 |
parents | 0e12ea6ba661 |
children |
rev | line source |
---|---|
1088
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
1 Neural Net committee |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
2 ==================== |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
3 |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
4 Members: |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
5 - Razvan Pascanu |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
6 - James Bergstra |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
7 - Xavier Glorot |
1099 | 8 - Guillaume Desjardins |
1088
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
9 |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
10 (Add your name here if you want) |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
11 |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
12 |
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
13 Objective ( Razvan) |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1099
diff
changeset
|
14 ------------------- |
1088
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
15 |
1090
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
16 Come up with a description of how to write learners ( how to combine |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
17 optimizer, structure, error measure, how to talk to datasets, tasks ( if there |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
18 is anything like a dataset object in your view) and so on). |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
19 o The way I see it personaly, we should pick "random" interfaces for any component |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
20 for which there is no one yet, or change the interface to answer our needs. |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
21 If our description of how these things get together. I would say come up with |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
22 pseudo-code for some tasks ( that vary as much as possible) + text describing |
a80b296eb0df
I removed big picture from the description of the neural network committee
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1088
diff
changeset
|
23 all the missing details. |
1088
e254065e7fd7
File for the new committee neural networks
Razvan Pascanu <r.pascanu@gmail.com>
parents:
diff
changeset
|
24 |
1092
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
25 Link with PLearn |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
26 ---------------- |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
27 |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
28 OD: This is basically what the OnlineLearningModule framework was doing in |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
29 PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
30 module was a "box" with so-called "ports" representing inputs / outputs. So |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
31 for instance you could think of an RBM as a module with "visible" and |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
32 "hidden" ports, but also "log_p_visible", "energy", etc. You would use |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
33 such a module by calling an fprop method where you would give some values for |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
34 input ports (not necessarily all of them), and would ask some output ports |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
35 (not necessarily all of them). Some ports could be used either as inputs or |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
36 outputs (e.g. the "hidden" port could be used as input to compute |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
37 P(visible|hidden), or as output to compute E[hidden|visible]). Optimization |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
38 was achieved independently within each module, who would be provided a |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
39 gradient w.r.t. some of its ports (considered outputs), and asked to update |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
40 its internal parameters and compute accodingly a gradient w.r.t. to its input |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
41 ports. |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
42 |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
43 Although it worked, it had some issues: |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
44 - The biggest problem was that as you added more ports and options to do |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
45 different computations, the fprop method would grow and grow and become very |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
46 difficult to write properly to handle all possible combinations of inputs / |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
47 outputs, while remaining efficient. Hopefully this is where Theano can be a |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
48 big help (note: a "lazy if" could be required to handle situations where the |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
49 same port is computed in very different ways depending on what is given as |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
50 input). |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
51 - We had to introduce a notion of 'states' that were ports whose values had to |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
52 be computed, even if they were not asked by the user. The reason was that |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
53 those values were required to perform the optimization (bprop phase) without |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
54 re-doing some computations. Hopefully again Theano could take care of it |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
55 (those states were potentially confusing to the user, who had to manipulate |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
56 them without necessarily understanding what they were for). |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
57 |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
58 Besides that, there are at least 3 design decisions that could be done |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
59 differently: |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
60 - How to connect those modules together: in those OnlineLearningModules, each |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
61 module had no idea of who it was connected to. A higher level entity was |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
62 responsible for grabbing the output of some module and forwarding it to its |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
63 target destination. This is to be contrasted with the design of PLearn |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
64 Variables, where each variable was explicitely constructed given its input |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
65 variables (Theano-like), and would directly ask them to provide data. I am not |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
66 sure what are the pros vs. cons of these two approaches. |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
67 - How to perform optimization. The OnlineLearningModule way is nice to plug |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
68 together pieces that are optimized very differently, because each module is |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
69 responsible for its own optimizatin. However, this also means it is difficult |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
70 to easily try different global optimizers (again, this is in contrast with |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
71 PLearn variables). |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
72 - One must think about the issue of RNG for stochastic modules. Here we had |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
73 one single RNG per module. This makes it diffiult to easily try different |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
74 seeds for everyone. On another hand, sharing a single RNG is not neceassarily |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
75 a good idea because of potentially unwanted side-effects. |
aab9c261361c
neural_net: Added info about how PLearn was doing it
Olivier Delalleau <delallea@iro>
parents:
1090
diff
changeset
|
76 |