Mercurial > pylearn
comparison doc/v2_planning/neural_net.txt @ 1092:aab9c261361c
neural_net: Added info about how PLearn was doing it
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Sun, 12 Sep 2010 15:12:19 -0400 |
parents | a80b296eb0df |
children | 0b666177f725 |
comparison
equal
deleted
inserted
replaced
1091:319de699fb67 | 1092:aab9c261361c |
---|---|
19 for which there is no one yet, or change the interface to answer our needs. | 19 for which there is no one yet, or change the interface to answer our needs. |
20 If our description of how these things get together. I would say come up with | 20 If our description of how these things get together. I would say come up with |
21 pseudo-code for some tasks ( that vary as much as possible) + text describing | 21 pseudo-code for some tasks ( that vary as much as possible) + text describing |
22 all the missing details. | 22 all the missing details. |
23 | 23 |
24 Link with PLearn | |
25 ---------------- | |
26 | |
27 OD: This is basically what the OnlineLearningModule framework was doing in | |
28 PLearn (c.f. PLearn/plearn_learners/online). Basically, the idea was that a | |
29 module was a "box" with so-called "ports" representing inputs / outputs. So | |
30 for instance you could think of an RBM as a module with "visible" and | |
31 "hidden" ports, but also "log_p_visible", "energy", etc. You would use | |
32 such a module by calling an fprop method where you would give some values for | |
33 input ports (not necessarily all of them), and would ask some output ports | |
34 (not necessarily all of them). Some ports could be used either as inputs or | |
35 outputs (e.g. the "hidden" port could be used as input to compute | |
36 P(visible|hidden), or as output to compute E[hidden|visible]). Optimization | |
37 was achieved independently within each module, who would be provided a | |
38 gradient w.r.t. some of its ports (considered outputs), and asked to update | |
39 its internal parameters and compute accodingly a gradient w.r.t. to its input | |
40 ports. | |
41 | |
42 Although it worked, it had some issues: | |
43 - The biggest problem was that as you added more ports and options to do | |
44 different computations, the fprop method would grow and grow and become very | |
45 difficult to write properly to handle all possible combinations of inputs / | |
46 outputs, while remaining efficient. Hopefully this is where Theano can be a | |
47 big help (note: a "lazy if" could be required to handle situations where the | |
48 same port is computed in very different ways depending on what is given as | |
49 input). | |
50 - We had to introduce a notion of 'states' that were ports whose values had to | |
51 be computed, even if they were not asked by the user. The reason was that | |
52 those values were required to perform the optimization (bprop phase) without | |
53 re-doing some computations. Hopefully again Theano could take care of it | |
54 (those states were potentially confusing to the user, who had to manipulate | |
55 them without necessarily understanding what they were for). | |
56 | |
57 Besides that, there are at least 3 design decisions that could be done | |
58 differently: | |
59 - How to connect those modules together: in those OnlineLearningModules, each | |
60 module had no idea of who it was connected to. A higher level entity was | |
61 responsible for grabbing the output of some module and forwarding it to its | |
62 target destination. This is to be contrasted with the design of PLearn | |
63 Variables, where each variable was explicitely constructed given its input | |
64 variables (Theano-like), and would directly ask them to provide data. I am not | |
65 sure what are the pros vs. cons of these two approaches. | |
66 - How to perform optimization. The OnlineLearningModule way is nice to plug | |
67 together pieces that are optimized very differently, because each module is | |
68 responsible for its own optimizatin. However, this also means it is difficult | |
69 to easily try different global optimizers (again, this is in contrast with | |
70 PLearn variables). | |
71 - One must think about the issue of RNG for stochastic modules. Here we had | |
72 one single RNG per module. This makes it diffiult to easily try different | |
73 seeds for everyone. On another hand, sharing a single RNG is not neceassarily | |
74 a good idea because of potentially unwanted side-effects. | |
75 |