Mercurial > pylearn
view doc/v2_planning/architecture_discussion.txt @ 1343:cf0fc12a50f7
record_version work with module that are not checkout and have __version__ defined(as numpy).
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Tue, 26 Oct 2010 16:34:20 -0400 |
parents | 93e1c7c9172b |
children |
line wrap: on
line source
Arnaud: From what I recall for the meeting last Friday, we saw three propositions for a runtime architecture for the experiments in pylearn. The thing I noticed was that none of the three propositions was addressing the same problem. So not only do we have to choose which one(s) we want, but we also have to decide upon what do we need. The proposals and the problems they address are outlined below, please comment if you see inaccuracies: - PL's proposal, the hooks thing, was about enabling hooks to be registered at predefined points in functions and giving them access to the local variables. This addresses nicely the problem of collecting stats and printing progress. - OB's proposal, the checkpoints thing, was about enabling the saving and loading of state at predefined points in the function. Other actions could also be performed at these points. - JB's proposal, the new language thing, was about expressing algorithms with a control structure made of classes so that its state and structure could be preserved. It could also define new control structure to run things in parallel, over multiple machines or not. Razvan: I would add the following observations: #1 --- This might be an artificial created issue, but I will write it down anyhow. We can decide later if we care about it. Imagine you have some function provided by the library that implements some (complicated) pattern. Let say deeplearning ( the pretraining followed by finetuning). You instantiate this somehow : instance = deeplearning(..) Now you want to add some function to a given hook, checkpoint or whatever to calculate some statistics. You of course can do that ( the documentation can tell you how those hooks are named), but what the function will get is the locals defined in deeplearning. So you need to open up the file that implements that deeplearning and understand the code to figure out what variable does what. Secondly if you need to execute a function in a unforseen place by the deeplearning,you can only do that by hacking the file implementing deeplearning function, i.e. by hacking the library. One can make sure that does not happen by overpopulating the code with hooks, but then we need a name for each hook. I can add that probably in most cases the logic that goes into this is simple enough that the issues above are insignificant, but I might be wrong. #2 --- I think it is much healthier to think of James proposal as a glorified pipeline and not as a new language. You have components that you add in you pipeline. A CALL is such a component. You run the program by executing the pipeline ( which goes from one component to the other and calls it) We are dealing with a glorified pipeline because : - when running the pipeline you can loop over a certain segment of the pipeline if you need to - you can, at run time, swtich between two possible terminations of the pipeline (the if command) - you can have two pipelines running in paralel, by running one component from one pipeline and then going to the other You can also think of what James proposes as sort of the same as Olivier's with the following differences: - Olivier makes this entire mechanism invisible to the eye while in James' case it is explicit - James has inplicit checkpoints between any component, in Olivier's case you can define pipelines at different points ( maybe even more finely grained that what James mechanism offers) - One can imagine how, though Olivier did not exactly explained how you could have hooks in a template such that you do not actually need to hack that code. James proposal also offers a way of expressing the distributed part in your main program. Is the same as having two pipelines between which you switch. Just think now each pipeline runs on a different machine independently and you just wait as the server for them to return. This is just one possibility. In this proposal you can also see how you would solve the unforseen hooks problem, by having a special function that could alter the pipeline in some way (for example by introducing new components). OD comments: It seemed to me that one major issue we are trying to solve with these approaches is that of being able to interrupt an experiment, then restart it later without starting again from scratch. OB and JB's proposals handle this more or less automatically (compared to PL's that would require more manual engineering of the save/load process). However it is not obvious to me that they would necessarily make things much easier, because: - One needs to use the same "framework" in all pieces of code (the + syntax for OB, or a single program for JB), otherwise some manual engineering will also be required. Can we reasonably expect the whole code to adhere to this? (maybe...) - If you want to be smart about what you (or rather do not) want to save, it may add yet another layer of complexity (I'm not sure though how hard it would be, so it'd be nice to have an example, e.g. if you are doing K-Fold CV with the training set stored in memory, but you don't want to save it on disk when serializing your experiment).