comparison doc/v2_planning/datalearn.txt @ 1359:5db730bb0e8e

comments on datalearn
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 11 Nov 2010 17:53:13 -0500
parents ffa2932a8cba
children 7548dc1b163c
comparison
equal deleted inserted replaced
1358:8cc66dac6430 1359:5db730bb0e8e
44 expressions coded in individual datasets into a single graph. Currently, we 44 expressions coded in individual datasets into a single graph. Currently, we
45 instead consider that a dataset has a member that is a Theano variable, and 45 instead consider that a dataset has a member that is a Theano variable, and
46 this variable represents the data stored in the dataset. The same is done for 46 this variable represents the data stored in the dataset. The same is done for
47 individual data samples. 47 individual data samples.
48 48
49 James asks: Why would a Theano graph in which some nodes represent datasets give
50 up the ability to combine Theano expressions coded in individual datasets?
51 Firstly, if you want to use Theano expressions and compiled functions to
52 implement the perform() method of an Op, you can do that. Secondly, you can
53 just include those 'expressions coded in individual datasets' into the overall
54 graph.
55
49 One issue with this approach is illustrated by the following example. Imagine 56 One issue with this approach is illustrated by the following example. Imagine
50 we want to iterate on samples in a dataset and do something with their 57 we want to iterate on samples in a dataset and do something with their
51 numeric value. We would want the code to be as close as possible to: 58 numeric value. We would want the code to be as close as possible to:
52 59
53 .. code-block:: python 60 .. code-block:: python
98 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar() 105 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar()
99 get_sample = theano.function([symbolic_index], 106 get_sample = theano.function([symbolic_index],
100 dataset[symbolic_index].variable) 107 dataset[symbolic_index].variable)
101 for numeric_index in xrange(len(dataset)) 108 for numeric_index in xrange(len(dataset))
102 do_something_with(get_sample(numeric_index)) 109 do_something_with(get_sample(numeric_index))
110
111 James comments: this is how I have written the last couple of projects, it's
112 slightly verbose but it's clear and efficient.
103 113
104 Note that although the above example focused on how to iterate over a dataset, 114 Note that although the above example focused on how to iterate over a dataset,
105 it can be cast into a more generic problem, where some data (either dataset or 115 it can be cast into a more generic problem, where some data (either dataset or
106 sample) is the result of some transformation applied to other data, which is 116 sample) is the result of some transformation applied to other data, which is
107 parameterized by parameters p1, p2, ..., pN (in the above example, we were 117 parameterized by parameters p1, p2, ..., pN (in the above example, we were
112 Ideally it would be nice to let the user take control on what is being 122 Ideally it would be nice to let the user take control on what is being
113 compiled, while leaving the option of using a default sensible behavior for 123 compiled, while leaving the option of using a default sensible behavior for
114 those who do not want to worry about it. How to achieve this is still to be 124 those who do not want to worry about it. How to achieve this is still to be
115 determined. 125 determined.
116 126
127
128 Another syntactic option for iterating over datasets is
129
130 .. code-block:: python
131
132 for sample in dataset.numeric_iterator(batchsize=10):
133 do_something_with(sample)
134
135 The numeric_iterator would create a symbolic batch index, and compile a single function
136 that extracts the corresponding minibatch. The arguments to the
137 numeric_iterator function can also specify what compile mode to use, any givens
138 you might want to apply, etc.
139
140
117 What About Learners? 141 What About Learners?
118 -------------------- 142 --------------------
119 143
120 The discussion above only mentioned datasets, but not learners. The learning 144 The discussion above only mentioned datasets, but not learners. The learning
121 part of a learner is not a main concern (currently). What matters most w.r.t. 145 part of a learner is not a main concern (currently). What matters most w.r.t.
122 what was discussed above is how a learner takes as input a dataset and outputs 146 what was discussed above is how a learner takes as input a dataset and outputs
123 another dataset that can be used with the dataset API. 147 another dataset that can be used with the dataset API.
148
149 James asks:
150 What's wrong with simply passing the variables corresponding to the dataset to
151 the constructor of the learner?
152 That seems much more flexible, compact, and clear than the decorator.
124 153
125 A Learner may be able to compute various things. For instance, a Neural 154 A Learner may be able to compute various things. For instance, a Neural
126 Network may output a ``prediction`` vector (whose elements correspond to 155 Network may output a ``prediction`` vector (whose elements correspond to
127 estimated probabilities of each class in a classification task), as well as a 156 estimated probabilities of each class in a classification task), as well as a
128 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone 157 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone