# HG changeset patch # User James Bergstra # Date 1289515993 18000 # Node ID 5db730bb0e8e5f9fffee4801b6a96c8635afd3f2 # Parent 8cc66dac6430150940a81cfb485f5fcfdd8cacca comments on datalearn diff -r 8cc66dac6430 -r 5db730bb0e8e doc/v2_planning/datalearn.txt --- a/doc/v2_planning/datalearn.txt Thu Nov 11 17:36:39 2010 -0500 +++ b/doc/v2_planning/datalearn.txt Thu Nov 11 17:53:13 2010 -0500 @@ -46,6 +46,13 @@ this variable represents the data stored in the dataset. The same is done for individual data samples. +James asks: Why would a Theano graph in which some nodes represent datasets give +up the ability to combine Theano expressions coded in individual datasets? +Firstly, if you want to use Theano expressions and compiled functions to +implement the perform() method of an Op, you can do that. Secondly, you can +just include those 'expressions coded in individual datasets' into the overall +graph. + One issue with this approach is illustrated by the following example. Imagine we want to iterate on samples in a dataset and do something with their numeric value. We would want the code to be as close as possible to: @@ -101,6 +108,9 @@ for numeric_index in xrange(len(dataset)) do_something_with(get_sample(numeric_index)) +James comments: this is how I have written the last couple of projects, it's +slightly verbose but it's clear and efficient. + Note that although the above example focused on how to iterate over a dataset, it can be cast into a more generic problem, where some data (either dataset or sample) is the result of some transformation applied to other data, which is @@ -114,6 +124,20 @@ those who do not want to worry about it. How to achieve this is still to be determined. + +Another syntactic option for iterating over datasets is + + .. code-block:: python + + for sample in dataset.numeric_iterator(batchsize=10): + do_something_with(sample) + +The numeric_iterator would create a symbolic batch index, and compile a single function +that extracts the corresponding minibatch. The arguments to the +numeric_iterator function can also specify what compile mode to use, any givens +you might want to apply, etc. + + What About Learners? -------------------- @@ -122,6 +146,11 @@ what was discussed above is how a learner takes as input a dataset and outputs another dataset that can be used with the dataset API. +James asks: +What's wrong with simply passing the variables corresponding to the dataset to +the constructor of the learner? +That seems much more flexible, compact, and clear than the decorator. + A Learner may be able to compute various things. For instance, a Neural Network may output a ``prediction`` vector (whose elements correspond to estimated probabilities of each class in a classification task), as well as a