Mercurial > pylearn
view doc/v2_planning/arch_src/plugin_JB_comments_YB.txt @ 1288:a165f2666643
cifar10 - added support for "all" split
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 29 Sep 2010 18:35:40 -0400 |
parents | 4a1339682c8f |
children |
line wrap: on
line source
YB. I am very worried about this proposal. It looks again like we would be creating another language to replace one we already have, namely python, mainly so that we could have introspection and programmable changes into an existing control flow structure (e.g. the standard DBN code). I feel that the negatives outweigh the advantages. Please correct me: Disadvantages: * much more difficult to read * much more difficult to debug JB asks: I would like to try and correct you, but I don't know where to begin -- - What do you think is more difficult to read [than what?] and why? - What do you expect to be more difficult [than what?] to debug? Advantages: * easier to serialize (can't we serialize an ordinary Python class created by a normal user?) * possible but not easier to programmatically modify existing learning algorithms (why not the usual constructor parameters and hooks, when possible, and just create another code for a new DBN variant when it can't fit?) * am I missing something? JB replies: - Re serializibility - I think any system that supports program pausing, resuming, and dynamic editing (a.k.a. process surgery) will have the flavour of my proposal. If someone has a better idea, he should suggest it. - Re hooks & constructors - the mechanism I propose is more flexible than hooks and constructor parameters. Hooks and constructor parameters have their place, and would be used under my proposal as usual to configure the modules on which the flow-graph operates. But flow-graphs are more flexible. Flow-graphs (REPEAT, CALL, etc.) that are constructed by library calls can be directly modified. You can add new hooks, for example, or add a print statement between two statements (CALLs) that previously had no hook between them. - the analagous thing using the real python VM would be to dynamically re-program Python's compiled bytecode, which I don't think is possible. I am not convinced that any of the stated advantages can't be achieved in more traditional ways. RP comment: James or anybody else correct me if I'm wrong. What I think James proposed is just a way encapsulating different steps of the program in some classes. These classes are serializable. They are not a programming language per se. The way I see it is like dividing your program in a set of functions. Each function is a control flow element applied to something ( like a CALL to a python function ). The idea is to wrap this functions around something to make them serializable, and also offer the added advantage that you have a graph that presents the order in which you should call the functions and you can play with that order. That is why I was trying to convince James to re-write things ( using some syntactic sugar) to make it look less intimidating ( I believe it can look much more "traditional" that it looks right now). I think a programming language might also be a overloaded term that so we might speak about different things. But if all that his proposal does is to offer some wrapper around python function that makes them serializable, and generate a execution order graph in which you can possible do simple operations ( like substitutions and replacements) I would not call it a programming language. I think the advantage of making the program aware where in its own execution flow it is and what is its execution flow can be quite useful for automating some of the things we want. OD comments: I agree with Yoshua. I actually thought (by watching at the discussions in these files from a rather high-level point-of-view) the main goal of this machinery was to help with parallelization. If that is the case, it may prove useful in some places, but it is not something that one would want to use everywhere. As far as serialization is concerned, I think this should be do-able without such a system (provided we all agree that we do not necessarily require the ability to serialize / restart at any point). About the ability to move / substitute things, you could probably achieve the same goal with proper code factorization / conventions. JB replies: You are right that with sufficient discipline on everyone's part, and a good design using existing python control flow (loops and functions) it is probably possible to get many of the features I'm claiming with my proposal. But I don't think Python offers a very helpful syntax or control flow elements for programming parallel distributed computations through, because the python interpreter doesn't do that. What I'm trying to design is a mechanism that can allow us to *express the entire learning algorithm* in a program. That means - including the grid-search, - including the use of the cluster, - including the pre-processing and post-processing. To make that actually work, programs need to be more flexible - we need to be able to pause and resume 'function calls', and to possibly restart them if we find a problem (without having to restart the whole program). We already do these things in ad-hoc ways by writing scripts, generating intermediate files, etc., but I think we would empower ourselves by using a tool that lets us actually write down the *WHOLE* algorithm, in one place rather than as a README with a list of scripts and instructions for what to do with them (especially because the README often never gets written). OD replies: I can see such a framework being useful for high-level experiment design (the "big picture", or how to plug different components together). What I am not convinced about is that we should also use it to write a standard serial machine learning algorithm (e.g. DBN training with fixed hyper-parameters). RP replies : What do you understand by writing down a DBN. I believe the structure and so on ( selecting the optimizers) shouldn't be done using this approach. You will start using this syntax to do early stopping, to decide the order of pre-training the layers. In my view you get something like pretrain_layer1, pretrain_layer2, finetune_one_step and then starting using James framework. Are you thinking in the same terms ? OD replies: Actually I wasn't thinking of using it at all inside a DBN's code. I forgot early stopping for each layer's training though, and it's true it may be useful to take advantage of some generic mechanism there... but I wouldn't use James' framework for it.