# HG changeset patch
# User Olivier Delalleau <delallea@iro>
# Date 1289576359 18000
# Node ID 6b9673d72a4153fcffc1f5c4849d52f8c6914aab
# Parent  7548dc1b163c239ac838cb5779e7969ed232add3
Datalearn replies / comments

diff -r 7548dc1b163c -r 6b9673d72a41 doc/v2_planning/datalearn.txt
--- a/doc/v2_planning/datalearn.txt	Thu Nov 11 22:40:01 2010 -0500
+++ b/doc/v2_planning/datalearn.txt	Fri Nov 12 10:39:19 2010 -0500
@@ -53,6 +53,16 @@
 just include those 'expressions coded in individual datasets' into the overall
 graph.
 
+OD replies to James: What I had in mind is you would be forced to compile your
+own function inside the perform() method of an Op. This seemed like a
+potential problem to me because it would prevent Theano from seeing the whole
+fine-grained graph and do optimizations across multiple dataset
+transformations (there may also be additional overhead from calling multiple
+function). But if you are saying it is possible to include 'expressions coded
+in individual datasets' into the overall graph, then I guess this point is
+moot. Would this be achieved with an optimization that replaces the dataset
+node with its internal graph?
+
 Razvan comments: 1) Having Theano expressions inside the perform of a Theano
 Op can lead to issues. I know I had to deal with a few when implementing
 Scan which does exactly this. Well to be fair these issues mostly come into
@@ -69,7 +79,14 @@
 indices, the dataset class can reload parts of the data into the 
 shared variable and so on.
 
-
+OD replies to Razvan's point 2: I think what you are saying is another concern
+I had, which was the fact it may be confusing to mix in the same class the
+Variable/Op and DataSet interfaces. I would indeed prefer to keep them
+separate. However, it may be possible to come up with a system that would get
+the best of both worlds (maybe by having the Op/Variable as members of
+Dataset, and just asking the user building a theano graph to use these instead
+of the dataset directly). Note that I'm mixing up Op/Variable here, because
+it's just not clear yet for me which would go where...
 
 One issue with this approach is illustrated by the following example. Imagine
 we want to iterate on samples in a dataset and do something with their
@@ -143,6 +160,11 @@
    certain nodes of the graph to reduce the number of compilation while in
    approach (2) we don't need to deal with the complexity of lazy
    compilation
+
+OD comments: Well, to be fair, it means we put the burden of dealing with the
+complexity of lazy compilation on the user (it's up to him to make sure he
+compiles only one function).
+
  - approach (1) needs a replace function if you want to change the dataset.
    What you would do, is once you have a "computational graph" or pipeline
    or whatever you call it, say ``graph``, to change the input you would do
@@ -174,6 +196,14 @@
 
         for datapoint in new_graph:
             do_something_with(datapoint())
+
+OD comments: I don't really understand what is 'graph' in this code (it
+appears in both approaches but is used differently). What I have in mind would
+be more with 'graph' removed in the first approach you describe (#2), and
+graph / new_graph replaced by dataset / new_dataset in the second one (#1).
+You wouldn't need to call some graph.replace method: the graphs compiled for
+iterating on 'dataset' and 'new_dataset' would be entirely separate (using two
+different compiled functions, pretty much like #2).
         
  - in approach (1) the initial dataset object (the one that loads the data)
    decides if you will use shared variables and indices to deal with the
@@ -225,7 +255,7 @@
 hyper-parameters for which you need to recompile the thenao function and 
 can not be just parameters ( so we would have yet another category ?).
 
-Another syntactic option for iterating over datasets is
+James: Another syntactic option for iterating over datasets is
 
     .. code-block:: python
 
@@ -237,6 +267,12 @@
 numeric_iterator function can also specify what compile mode to use, any givens
 you might want to apply, etc.
 
+OD comments: Would there also be some kind of function cache to avoid
+compiling the same function again if we re-iterate on the same dataset with
+the same arguments? Maybe a more generic issue is: would there be a way for
+Theano to be more efficient when re-compiling the same function that was
+already compiled in the same program? (note that I am assuming here it is not
+efficient, but I may be wrong).
 
 What About Learners?
 --------------------
@@ -251,6 +287,15 @@
 the constructor of the learner?
 That seems much more flexible, compact, and clear than the decorator.
 
+OD replies: Not sure I understand your idea here. We probably want a learner
+to be able to compute its output on multiple datasets, without having to point
+to these datasets within the learner itself (which seems cumbersome to me).
+The point of the decorators is mostly to turn a single function (that outputs
+a theano variable for the ouptut computed on a single sample) into a function
+that can compute symbolic datasets as well as numeric sample outputs. Those
+could also be instead different functions in the base Learner class if the
+decorator approach is considered ugly / confusing.
+
 A Learner may be able to compute various things. For instance, a Neural
 Network may output a ``prediction`` vector (whose elements correspond to
 estimated probabilities of each class in a classification task), as well as a
@@ -330,3 +375,5 @@
 
 Is this close to what you are suggesting?
 
+OD: Yes, you guessed right, the decorator's role is to do something different
+depending on the input to the function (see my reply to James above).