changeset 1247:8dfe9d6e72f6

plugin_JB replies
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 23 Sep 2010 13:20:19 -0400
parents 14444845989a
children fda31afc0df6
files doc/v2_planning/arch_src/plugin_JB_comments_YB.txt
diffstat 1 files changed, 45 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/arch_src/plugin_JB_comments_YB.txt	Thu Sep 23 12:57:06 2010 -0400
+++ b/doc/v2_planning/arch_src/plugin_JB_comments_YB.txt	Thu Sep 23 13:20:19 2010 -0400
@@ -13,6 +13,11 @@
 * much more difficult to read
 * much more difficult to debug
 
+JB asks: I would like to try and correct you, but I don't know where to begin --
+  - What do you think is more difficult to read [than what?] and why?
+  - What do you expect to be more difficult [than what?] to debug?
+
+
 Advantages:
 
 * easier to serialize (can't we serialize an ordinary Python class created by a normal user?)
@@ -21,6 +26,21 @@
    when possible, and just create another code for a new DBN variant when it can't fit?)
 * am I missing something?
 
+JB replies:
+  - Re serializibility - I think any system that supports program pausing,
+    resuming, and dynamic editing (a.k.a. process surgery) will have the flavour
+    of my  proposal.  If someone has a better idea, he should suggest it.
+
+  - Re hooks & constructors - the mechanism I propose is more flexible than hooks and constructor
+    parameters.  Hooks and constructor parameters have their place, and would be
+    used under my proposal as usual to configure the modules on which the
+    flow-graph operates.  But flow-graphs are more flexible. Flow-graphs
+    (REPEAT, CALL, etc.) that are constructed by library calls can be directly
+    modified.  You can add new hooks, for example, or add a print statement
+    between two statements (CALLs) that previously had no hook between them.
+    - the analagous thing using the real python VM would be to dynamically
+      re-program Python's compiled bytecode, which I don't think is possible.
+
 I am not convinced that any of the stated advantages can't be achieved in more traditional ways.
 
 RP comment: James or anybody else correct me if I'm wrong. What I think James
@@ -55,3 +75,28 @@
 necessarily require the ability to serialize / restart at any point). About
 the ability to move / substitute things, you could probably achieve the same
 goal with proper code factorization / conventions.
+
+JB replies: 
+  You are right that with sufficient discipline on everyone's part,
+  and a good design using existing python control flow (loops and functions) it is
+  probably possible to get many of the features I'm claiming with my proposal.  
+
+  But I don't think Python offers a very helpful syntax or control flow elements
+  for programming parallel distributed computations through, because the python
+  interpreter doesn't do that.
+
+  What I'm trying to design is a mechanism that can allow us to *express the entire
+  learning algorithm* in a program.  That means 
+  - including the grid-search,
+  - including the use of the cluster, 
+  - including the pre-processing and post-processing.
+
+  To make that actually work, programs need to be more flexible - we need to be
+  able to pause and resume 'function calls', and to possibly restart them if we
+  find a problem (without having to restart the whole program).  We already do
+  these things in ad-hoc ways by writing scripts, generating intermediate files,
+  etc., but I think we would empower ourselves by using a tool that lets us
+  actually write down the *WHOLE* algorithm, in one place rather than as a README
+  with a list of scripts and instructions for what to do with them (especially
+  because the README often never gets written).
+