# HG changeset patch # User James Bergstra # Date 1285262419 14400 # Node ID 8dfe9d6e72f61a79f322caed7c2e7d9f3043a7d0 # Parent 14444845989a2a63d37ff8aff40646d2af79912b plugin_JB replies diff -r 14444845989a -r 8dfe9d6e72f6 doc/v2_planning/arch_src/plugin_JB_comments_YB.txt --- a/doc/v2_planning/arch_src/plugin_JB_comments_YB.txt Thu Sep 23 12:57:06 2010 -0400 +++ b/doc/v2_planning/arch_src/plugin_JB_comments_YB.txt Thu Sep 23 13:20:19 2010 -0400 @@ -13,6 +13,11 @@ * much more difficult to read * much more difficult to debug +JB asks: I would like to try and correct you, but I don't know where to begin -- + - What do you think is more difficult to read [than what?] and why? + - What do you expect to be more difficult [than what?] to debug? + + Advantages: * easier to serialize (can't we serialize an ordinary Python class created by a normal user?) @@ -21,6 +26,21 @@ when possible, and just create another code for a new DBN variant when it can't fit?) * am I missing something? +JB replies: + - Re serializibility - I think any system that supports program pausing, + resuming, and dynamic editing (a.k.a. process surgery) will have the flavour + of my proposal. If someone has a better idea, he should suggest it. + + - Re hooks & constructors - the mechanism I propose is more flexible than hooks and constructor + parameters. Hooks and constructor parameters have their place, and would be + used under my proposal as usual to configure the modules on which the + flow-graph operates. But flow-graphs are more flexible. Flow-graphs + (REPEAT, CALL, etc.) that are constructed by library calls can be directly + modified. You can add new hooks, for example, or add a print statement + between two statements (CALLs) that previously had no hook between them. + - the analagous thing using the real python VM would be to dynamically + re-program Python's compiled bytecode, which I don't think is possible. + I am not convinced that any of the stated advantages can't be achieved in more traditional ways. RP comment: James or anybody else correct me if I'm wrong. What I think James @@ -55,3 +75,28 @@ necessarily require the ability to serialize / restart at any point). About the ability to move / substitute things, you could probably achieve the same goal with proper code factorization / conventions. + +JB replies: + You are right that with sufficient discipline on everyone's part, + and a good design using existing python control flow (loops and functions) it is + probably possible to get many of the features I'm claiming with my proposal. + + But I don't think Python offers a very helpful syntax or control flow elements + for programming parallel distributed computations through, because the python + interpreter doesn't do that. + + What I'm trying to design is a mechanism that can allow us to *express the entire + learning algorithm* in a program. That means + - including the grid-search, + - including the use of the cluster, + - including the pre-processing and post-processing. + + To make that actually work, programs need to be more flexible - we need to be + able to pause and resume 'function calls', and to possibly restart them if we + find a problem (without having to restart the whole program). We already do + these things in ad-hoc ways by writing scripts, generating intermediate files, + etc., but I think we would empower ourselves by using a tool that lets us + actually write down the *WHOLE* algorithm, in one place rather than as a README + with a list of scripts and instructions for what to do with them (especially + because the README often never gets written). +