view doc/v2_planning/architecture_discussion.txt @ 1517:a6e634b83d88

allow to read filetensor compressed with bz2
author Frederic Bastien <nouiz@nouiz.org>
date Wed, 09 May 2012 11:56:28 -0400
parents 93e1c7c9172b
children
line wrap: on
line source

Arnaud:

From what I recall for the meeting last Friday, we saw three
propositions for a runtime architecture for the experiments in
pylearn.

The thing I noticed was that none of the three propositions was
addressing the same problem.  So not only do we have to choose which
one(s) we want, but we also have to decide upon what do we need.

The proposals and the problems they address are outlined below, please
comment if you see inaccuracies:

- PL's proposal, the hooks thing, was about enabling hooks to be
registered at predefined points in functions and giving them access to
the local variables.  This addresses nicely the problem of collecting
stats and printing progress.

- OB's proposal, the checkpoints thing, was about enabling the saving
and loading of state at predefined points in the function.  Other
actions could also be performed at these points.

- JB's proposal, the new language thing, was about expressing
algorithms with a control structure made of classes so that its state
and structure could be preserved.  It could also define new control
structure to run things in parallel, over multiple machines or not.

Razvan:

 I would add the following observations: 

 #1
---

This might be an artificial created issue, but I will write it down anyhow.
We can decide later if we care about it.

  Imagine you have some function provided by the library that implements
some (complicated) pattern. Let say deeplearning ( the pretraining followed
by finetuning). You instantiate this somehow :

instance = deeplearning(..)

Now you want to add some function to a given hook, checkpoint or whatever
to calculate some statistics. You of course can do that ( the documentation
can tell you how those hooks are named), but what the function will get is
the locals defined in deeplearning. So you need to open up the file that
implements that deeplearning and understand the code to figure out what
variable does what.

Secondly if you need to execute a function in a unforseen place by the
deeplearning,you can only do that by hacking the file implementing
deeplearning function, i.e. by hacking the library. One can make sure that
does not happen by overpopulating the code with hooks, but then we need 
a name for each hook.

I can add that probably in most cases the logic that goes into this is 
simple enough that the issues above are insignificant, but I might be wrong.


 #2
---

I think it is much healthier to think of James proposal as a glorified
pipeline and not as a new language. You have components that you add in
you pipeline. A CALL is such a component. You run the program by executing
the pipeline ( which goes from one component to the other and calls it)

We are dealing with a glorified pipeline because :
   - when running the pipeline you can loop over a certain segment of the 
   pipeline if you need to
   - you can, at run time, swtich between two possible terminations of the
   pipeline  (the if command)
   - you can have two pipelines running in paralel, by running one
   component from one pipeline and then going to the other

You can also think of what James proposes as sort of the same as
Olivier's with the following differences:
   - Olivier makes this entire mechanism invisible to the eye while in 
   James' case it is explicit
   - James has inplicit checkpoints between any component, in Olivier's
   case you can define pipelines at different points ( maybe even more 
   finely grained that what James mechanism offers)
   - One can imagine how, though Olivier did not exactly explained
   how you could have hooks in a template such that you do not actually need
   to hack that code.

James proposal also offers a way of expressing the distributed part in
your main program. Is the same as having two pipelines between which you
switch. Just think now each pipeline runs on a different machine 
independently and you just wait as the server for them to return. This 
is just one possibility.

In this proposal you can also see how you would solve the unforseen hooks
problem, by having a special function that could alter the pipeline in some
way (for example by introducing new components).

OD comments: It seemed to me that one major issue we are trying to solve with
these approaches is that of being able to interrupt an experiment, then
restart it later without starting again from scratch. OB and JB's proposals
handle this more or less automatically (compared to PL's that would require
more manual engineering of the save/load process). However it is not obvious
to me that they would necessarily make things much easier, because:
    - One needs to use the same "framework" in all pieces of code (the +
      syntax for OB, or a single program for JB), otherwise some manual
      engineering will also be required. Can we reasonably expect the whole
      code to adhere to this? (maybe...)
    - If you want to be smart about what you (or rather do not) want to save,
      it may add yet another layer of complexity (I'm not sure though how hard
      it would be, so it'd be nice to have an example, e.g. if you are doing
      K-Fold CV with the training set stored in memory, but you don't want to
      save it on disk when serializing your experiment).