view doc/v2_planning/arch_src/plugin_JB_comments_YB.txt @ 1288:a165f2666643

cifar10 - added support for "all" split
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 29 Sep 2010 18:35:40 -0400
parents 4a1339682c8f
children
line wrap: on
line source


YB. I am very worried about this proposal. It looks again like we would be
creating another language to replace one we already have, namely python,
mainly so that we could have introspection and programmable changes
into an existing control flow structure (e.g. the standard DBN code).

I feel that the negatives outweigh the advantages.

Please correct me:

Disadvantages:

* much more difficult to read
* much more difficult to debug

JB asks: I would like to try and correct you, but I don't know where to begin --
  - What do you think is more difficult to read [than what?] and why?
  - What do you expect to be more difficult [than what?] to debug?


Advantages:

* easier to serialize (can't we serialize an ordinary Python class created by a normal user?)
* possible but not easier to programmatically modify existing learning algorithms 
  (why not the usual constructor parameters and hooks, 
   when possible, and just create another code for a new DBN variant when it can't fit?)
* am I missing something?

JB replies:
  - Re serializibility - I think any system that supports program pausing,
    resuming, and dynamic editing (a.k.a. process surgery) will have the flavour
    of my  proposal.  If someone has a better idea, he should suggest it.

  - Re hooks & constructors - the mechanism I propose is more flexible than hooks and constructor
    parameters.  Hooks and constructor parameters have their place, and would be
    used under my proposal as usual to configure the modules on which the
    flow-graph operates.  But flow-graphs are more flexible. Flow-graphs
    (REPEAT, CALL, etc.) that are constructed by library calls can be directly
    modified.  You can add new hooks, for example, or add a print statement
    between two statements (CALLs) that previously had no hook between them.
    - the analagous thing using the real python VM would be to dynamically
      re-program Python's compiled bytecode, which I don't think is possible.

I am not convinced that any of the stated advantages can't be achieved in more traditional ways.

RP comment: James or anybody else correct me if I'm wrong. What I think James
proposed is just a way encapsulating different steps of the program in some
classes. These classes are serializable. They are not a programming language 
per se. The way I see it is like dividing your program in a set of functions. 
Each function is a control flow element applied to something ( like a CALL to 
a python function ). The idea is to wrap this functions around something to
make them serializable, and also offer the added advantage that you have a
graph that presents the order in which you should call the functions and you
can play with that order.

That is why I was trying to convince James to re-write things ( using some
syntactic sugar) to make it look less intimidating ( I believe it can look 
much more "traditional" that it looks right now). I think a programming
language might also be a overloaded term that so we might speak about
different things. But if all that his proposal does is to offer some wrapper
around python function that makes them serializable, and generate a execution
order graph in which you can possible do simple operations ( like
substitutions and replacements) I would not call it a programming language. 

I think the advantage of making the program aware where in its own execution 
flow it is and what is its execution flow can be quite useful for automating 
some of the things we want.

OD comments: I agree with Yoshua. I actually thought (by watching at the
discussions in these files from a rather high-level point-of-view) the main
goal of this machinery was to help with parallelization. If that is the case,
it may prove useful in some places, but it is not something that one would
want to use everywhere. As far as serialization is concerned, I think this
should be do-able without such a system (provided we all agree that we do not
necessarily require the ability to serialize / restart at any point). About
the ability to move / substitute things, you could probably achieve the same
goal with proper code factorization / conventions.

JB replies: 
  You are right that with sufficient discipline on everyone's part,
  and a good design using existing python control flow (loops and functions) it is
  probably possible to get many of the features I'm claiming with my proposal.  

  But I don't think Python offers a very helpful syntax or control flow elements
  for programming parallel distributed computations through, because the python
  interpreter doesn't do that.

  What I'm trying to design is a mechanism that can allow us to *express the entire
  learning algorithm* in a program.  That means 
  - including the grid-search,
  - including the use of the cluster, 
  - including the pre-processing and post-processing.

  To make that actually work, programs need to be more flexible - we need to be
  able to pause and resume 'function calls', and to possibly restart them if we
  find a problem (without having to restart the whole program).  We already do
  these things in ad-hoc ways by writing scripts, generating intermediate files,
  etc., but I think we would empower ourselves by using a tool that lets us
  actually write down the *WHOLE* algorithm, in one place rather than as a README
  with a list of scripts and instructions for what to do with them (especially
  because the README often never gets written).

OD replies: I can see such a framework being useful for high-level experiment
design (the "big picture", or how to plug different components together). What
I am not convinced about is that we should also use it to write a standard
serial machine learning algorithm (e.g. DBN training with fixed
hyper-parameters).

RP replies : What do you understand by writing down a DBN. I believe the
structure and so on ( selecting the optimizers) shouldn't be done using this
approach. You will start using this syntax to do early stopping, to decide the
order of pre-training the layers. In my view you get something like
pretrain_layer1, pretrain_layer2, finetune_one_step and then starting using
James framework. Are you thinking in the same terms ? 

OD replies: Actually I wasn't thinking of using it at all inside a DBN's code.
I forgot early stopping for each layer's training though, and it's true it may
be useful to take advantage of some generic mechanism there... but I wouldn't
use James' framework for it.