Mercurial > pylearn

--- a/doc/v2_planning/datalearn.txt	Fri Nov 12 10:39:19 2010 -0500
+++ b/doc/v2_planning/datalearn.txt	Fri Nov 12 11:11:49 2010 -0500
@@ -204,7 +204,22 @@
 You wouldn't need to call some graph.replace method: the graphs compiled for
 iterating on 'dataset' and 'new_dataset' would be entirely separate (using two
 different compiled functions, pretty much like #2).
-
+
+RP answers: Yes you are right. What I was trying to say is if you have two
+different datasets on which you want to apply the same pre-processing you
+can do that in both approaches. ``graph`` represents the pre-processing
+steps in (2) and the end dataset (after preprocessing) in (1). So the idea
+is that instead of making new_graph from scratch (re-applying all the
+transforms on the original dataset) you can use replace. Or maybe the
+__call__ (that compiles the function if needed) can get a givens dictionary
+( that replaces datasets or more ). I only gave this argument because I
+thought this will be an issue people will raise. They will say, well in (2)
+the pipeline logic is separated from the data, so you can use the same
+transformation with different data easily, while in (1) you write the
+transformation rooted in a dataset, and if you want same transformation
+for a different dataset you have to re-write everything.
+
+
  - in approach (1) the initial dataset object (the one that loads the data)
    decides if you will use shared variables and indices to deal with the
    dataset or if you will use ``theano.tensor.matrix`` and not the user( at
@@ -272,7 +287,7 @@
 the same arguments? Maybe a more generic issue is: would there be a way for
 Theano to be more efficient when re-compiling the same function that was
 already compiled in the same program? (note that I am assuming here it is not
-efficient, but I may be wrong).
+efficient, but I may be wrong).

 What About Learners?
 --------------------