changeset 1359:5db730bb0e8e

comments on datalearn
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 11 Nov 2010 17:53:13 -0500
parents 8cc66dac6430
children f81b3b6f9698
files doc/v2_planning/datalearn.txt
diffstat 1 files changed, 29 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/datalearn.txt	Thu Nov 11 17:36:39 2010 -0500
+++ b/doc/v2_planning/datalearn.txt	Thu Nov 11 17:53:13 2010 -0500
@@ -46,6 +46,13 @@
 this variable represents the data stored in the dataset. The same is done for
 individual data samples.
 
+James asks: Why would a Theano graph in which some nodes represent datasets give
+up the ability to combine Theano expressions coded in individual datasets?
+Firstly, if you want to use Theano expressions and compiled functions to
+implement the perform() method of an Op, you can do that.  Secondly, you can
+just include those 'expressions coded in individual datasets' into the overall
+graph.
+
 One issue with this approach is illustrated by the following example. Imagine
 we want to iterate on samples in a dataset and do something with their
 numeric value. We would want the code to be as close as possible to:
@@ -101,6 +108,9 @@
         for numeric_index in xrange(len(dataset))
             do_something_with(get_sample(numeric_index))
 
+James comments: this is how I have written the last couple of projects, it's
+slightly verbose but it's clear and efficient.
+
 Note that although the above example focused on how to iterate over a dataset,
 it can be cast into a more generic problem, where some data (either dataset or
 sample) is the result of some transformation applied to other data, which is
@@ -114,6 +124,20 @@
 those who do not want to worry about it. How to achieve this is still to be
 determined.
 
+
+Another syntactic option for iterating over datasets is
+
+    .. code-block:: python
+
+        for sample in dataset.numeric_iterator(batchsize=10):
+            do_something_with(sample)
+
+The numeric_iterator would create a symbolic batch index, and compile a single function
+that extracts the corresponding minibatch.  The arguments to the
+numeric_iterator function can also specify what compile mode to use, any givens
+you might want to apply, etc.
+
+
 What About Learners?
 --------------------
 
@@ -122,6 +146,11 @@
 what was discussed above is how a learner takes as input a dataset and outputs
 another dataset that can be used with the dataset API.
 
+James asks:
+What's wrong with simply passing the variables corresponding to the dataset to
+the constructor of the learner?
+That seems much more flexible, compact, and clear than the decorator.
+
 A Learner may be able to compute various things. For instance, a Neural
 Network may output a ``prediction`` vector (whose elements correspond to
 estimated probabilities of each class in a classification task), as well as a