Mercurial > pylearn
annotate doc/v2_planning/arch_src/plugin_JB_comments_YB.txt @ 1268:78a09cdd449b
added some wishlist items for gd module
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Fri, 03 Sep 2010 12:35:49 -0400 |
parents | 4a1339682c8f |
children |
rev | line source |
---|---|
1238
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
1 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
2 YB. I am very worried about this proposal. It looks again like we would be |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
3 creating another language to replace one we already have, namely python, |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 mainly so that we could have introspection and programmable changes |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 into an existing control flow structure (e.g. the standard DBN code). |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
6 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 I feel that the negatives outweigh the advantages. |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
9 Please correct me: |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 Disadvantages: |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
12 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 * much more difficult to read |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 * much more difficult to debug |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 |
1247
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
16 JB asks: I would like to try and correct you, but I don't know where to begin -- |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
17 - What do you think is more difficult to read [than what?] and why? |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
18 - What do you expect to be more difficult [than what?] to debug? |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
19 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
20 |
1238
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
21 Advantages: |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
22 |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
23 * easier to serialize (can't we serialize an ordinary Python class created by a normal user?) |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
24 * possible but not easier to programmatically modify existing learning algorithms |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
25 (why not the usual constructor parameters and hooks, |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
26 when possible, and just create another code for a new DBN variant when it can't fit?) |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
27 * am I missing something? |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
28 |
1247
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
29 JB replies: |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
30 - Re serializibility - I think any system that supports program pausing, |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
31 resuming, and dynamic editing (a.k.a. process surgery) will have the flavour |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
32 of my proposal. If someone has a better idea, he should suggest it. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
33 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
34 - Re hooks & constructors - the mechanism I propose is more flexible than hooks and constructor |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
35 parameters. Hooks and constructor parameters have their place, and would be |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
36 used under my proposal as usual to configure the modules on which the |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
37 flow-graph operates. But flow-graphs are more flexible. Flow-graphs |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
38 (REPEAT, CALL, etc.) that are constructed by library calls can be directly |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
39 modified. You can add new hooks, for example, or add a print statement |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
40 between two statements (CALLs) that previously had no hook between them. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
41 - the analagous thing using the real python VM would be to dynamically |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
42 re-program Python's compiled bytecode, which I don't think is possible. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
43 |
1238
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
44 I am not convinced that any of the stated advantages can't be achieved in more traditional ways. |
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
45 |
1242
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
46 RP comment: James or anybody else correct me if I'm wrong. What I think James |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
47 proposed is just a way encapsulating different steps of the program in some |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
48 classes. These classes are serializable. They are not a programming language |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
49 per se. The way I see it is like dividing your program in a set of functions. |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
50 Each function is a control flow element applied to something ( like a CALL to |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
51 a python function ). The idea is to wrap this functions around something to |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
52 make them serializable, and also offer the added advantage that you have a |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
53 graph that presents the order in which you should call the functions and you |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
54 can play with that order. |
1238
067b2f9ba122
comments by YB on JB's arch_sr/plugin_JB.py
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
55 |
1242
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
56 That is why I was trying to convince James to re-write things ( using some |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
57 syntactic sugar) to make it look less intimidating ( I believe it can look |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
58 much more "traditional" that it looks right now). I think a programming |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
59 language might also be a overloaded term that so we might speak about |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
60 different things. But if all that his proposal does is to offer some wrapper |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
61 around python function that makes them serializable, and generate a execution |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
62 order graph in which you can possible do simple operations ( like |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
63 substitutions and replacements) I would not call it a programming language. |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
64 |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
65 I think the advantage of making the program aware where in its own execution |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
66 flow it is and what is its execution flow can be quite useful for automating |
316410a38f6f
comment on Yoshua's comment on James architecture
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1238
diff
changeset
|
67 some of the things we want. |
1245
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
68 |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
69 OD comments: I agree with Yoshua. I actually thought (by watching at the |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
70 discussions in these files from a rather high-level point-of-view) the main |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
71 goal of this machinery was to help with parallelization. If that is the case, |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
72 it may prove useful in some places, but it is not something that one would |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
73 want to use everywhere. As far as serialization is concerned, I think this |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
74 should be do-able without such a system (provided we all agree that we do not |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
75 necessarily require the ability to serialize / restart at any point). About |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
76 the ability to move / substitute things, you could probably achieve the same |
808e38dce8d6
Replied to YB's comment on JB's system
Olivier Delalleau <delallea@iro>
parents:
1242
diff
changeset
|
77 goal with proper code factorization / conventions. |
1247
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
78 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
79 JB replies: |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
80 You are right that with sufficient discipline on everyone's part, |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
81 and a good design using existing python control flow (loops and functions) it is |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
82 probably possible to get many of the features I'm claiming with my proposal. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
83 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
84 But I don't think Python offers a very helpful syntax or control flow elements |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
85 for programming parallel distributed computations through, because the python |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
86 interpreter doesn't do that. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
87 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
88 What I'm trying to design is a mechanism that can allow us to *express the entire |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
89 learning algorithm* in a program. That means |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
90 - including the grid-search, |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
91 - including the use of the cluster, |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
92 - including the pre-processing and post-processing. |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
93 |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
94 To make that actually work, programs need to be more flexible - we need to be |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
95 able to pause and resume 'function calls', and to possibly restart them if we |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
96 find a problem (without having to restart the whole program). We already do |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
97 these things in ad-hoc ways by writing scripts, generating intermediate files, |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
98 etc., but I think we would empower ourselves by using a tool that lets us |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
99 actually write down the *WHOLE* algorithm, in one place rather than as a README |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
100 with a list of scripts and instructions for what to do with them (especially |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
101 because the README often never gets written). |
8dfe9d6e72f6
plugin_JB replies
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1245
diff
changeset
|
102 |
1250 | 103 OD replies: I can see such a framework being useful for high-level experiment |
104 design (the "big picture", or how to plug different components together). What | |
105 I am not convinced about is that we should also use it to write a standard | |
106 serial machine learning algorithm (e.g. DBN training with fixed | |
107 hyper-parameters). | |
1251
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
108 |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
109 RP replies : What do you understand by writing down a DBN. I believe the |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
110 structure and so on ( selecting the optimizers) shouldn't be done using this |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
111 approach. You will start using this syntax to do early stopping, to decide the |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
112 order of pre-training the layers. In my view you get something like |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
113 pretrain_layer1, pretrain_layer2, finetune_one_step and then starting using |
70ca63c05672
comment on OD's reply
Razvan Pascanu <r.pascanu@gmail.com>
parents:
1250
diff
changeset
|
114 James framework. Are you thinking in the same terms ? |
1252 | 115 |
116 OD replies: Actually I wasn't thinking of using it at all inside a DBN's code. | |
117 I forgot early stopping for each layer's training though, and it's true it may | |
118 be useful to take advantage of some generic mechanism there... but I wouldn't | |
119 use James' framework for it. |