comparison doc/v2_planning/datalearn.txt @ 1362:6b9673d72a41

Datalearn replies / comments
author Olivier Delalleau <delallea@iro>
date Fri, 12 Nov 2010 10:39:19 -0500
parents 7548dc1b163c
children 18b2ebec6bca
comparison
equal deleted inserted replaced
1361:7548dc1b163c 1362:6b9673d72a41
50 up the ability to combine Theano expressions coded in individual datasets? 50 up the ability to combine Theano expressions coded in individual datasets?
51 Firstly, if you want to use Theano expressions and compiled functions to 51 Firstly, if you want to use Theano expressions and compiled functions to
52 implement the perform() method of an Op, you can do that. Secondly, you can 52 implement the perform() method of an Op, you can do that. Secondly, you can
53 just include those 'expressions coded in individual datasets' into the overall 53 just include those 'expressions coded in individual datasets' into the overall
54 graph. 54 graph.
55
56 OD replies to James: What I had in mind is you would be forced to compile your
57 own function inside the perform() method of an Op. This seemed like a
58 potential problem to me because it would prevent Theano from seeing the whole
59 fine-grained graph and do optimizations across multiple dataset
60 transformations (there may also be additional overhead from calling multiple
61 function). But if you are saying it is possible to include 'expressions coded
62 in individual datasets' into the overall graph, then I guess this point is
63 moot. Would this be achieved with an optimization that replaces the dataset
64 node with its internal graph?
55 65
56 Razvan comments: 1) Having Theano expressions inside the perform of a Theano 66 Razvan comments: 1) Having Theano expressions inside the perform of a Theano
57 Op can lead to issues. I know I had to deal with a few when implementing 67 Op can lead to issues. I know I had to deal with a few when implementing
58 Scan which does exactly this. Well to be fair these issues mostly come into 68 Scan which does exactly this. Well to be fair these issues mostly come into
59 play when the inner graph has to interact with the outer graph and most of 69 play when the inner graph has to interact with the outer graph and most of
67 that gives you shared variables, symbolic indices into that shared 77 that gives you shared variables, symbolic indices into that shared
68 variables, and also numeric indices. When looping through those numeric 78 variables, and also numeric indices. When looping through those numeric
69 indices, the dataset class can reload parts of the data into the 79 indices, the dataset class can reload parts of the data into the
70 shared variable and so on. 80 shared variable and so on.
71 81
72 82 OD replies to Razvan's point 2: I think what you are saying is another concern
83 I had, which was the fact it may be confusing to mix in the same class the
84 Variable/Op and DataSet interfaces. I would indeed prefer to keep them
85 separate. However, it may be possible to come up with a system that would get
86 the best of both worlds (maybe by having the Op/Variable as members of
87 Dataset, and just asking the user building a theano graph to use these instead
88 of the dataset directly). Note that I'm mixing up Op/Variable here, because
89 it's just not clear yet for me which would go where...
73 90
74 One issue with this approach is illustrated by the following example. Imagine 91 One issue with this approach is illustrated by the following example. Imagine
75 we want to iterate on samples in a dataset and do something with their 92 we want to iterate on samples in a dataset and do something with their
76 numeric value. We would want the code to be as close as possible to: 93 numeric value. We would want the code to be as close as possible to:
77 94
141 explicit because you have to manually compile your functions. 158 explicit because you have to manually compile your functions.
142 - approach (1) needs to use this function_storage trick shared between 159 - approach (1) needs to use this function_storage trick shared between
143 certain nodes of the graph to reduce the number of compilation while in 160 certain nodes of the graph to reduce the number of compilation while in
144 approach (2) we don't need to deal with the complexity of lazy 161 approach (2) we don't need to deal with the complexity of lazy
145 compilation 162 compilation
163
164 OD comments: Well, to be fair, it means we put the burden of dealing with the
165 complexity of lazy compilation on the user (it's up to him to make sure he
166 compiles only one function).
167
146 - approach (1) needs a replace function if you want to change the dataset. 168 - approach (1) needs a replace function if you want to change the dataset.
147 What you would do, is once you have a "computational graph" or pipeline 169 What you would do, is once you have a "computational graph" or pipeline
148 or whatever you call it, say ``graph``, to change the input you would do 170 or whatever you call it, say ``graph``, to change the input you would do
149 graph.replace({ init_data_X: new_data_X}), In approach (2) the init_data_X 171 graph.replace({ init_data_X: new_data_X}), In approach (2) the init_data_X
150 and new_data_X is the ``dataset`` so you would compile two different 172 and new_data_X is the ``dataset`` so you would compile two different
172 194
173 new_graph = graph.replace({dataset:dataset2}) 195 new_graph = graph.replace({dataset:dataset2})
174 196
175 for datapoint in new_graph: 197 for datapoint in new_graph:
176 do_something_with(datapoint()) 198 do_something_with(datapoint())
199
200 OD comments: I don't really understand what is 'graph' in this code (it
201 appears in both approaches but is used differently). What I have in mind would
202 be more with 'graph' removed in the first approach you describe (#2), and
203 graph / new_graph replaced by dataset / new_dataset in the second one (#1).
204 You wouldn't need to call some graph.replace method: the graphs compiled for
205 iterating on 'dataset' and 'new_dataset' would be entirely separate (using two
206 different compiled functions, pretty much like #2).
177 207
178 - in approach (1) the initial dataset object (the one that loads the data) 208 - in approach (1) the initial dataset object (the one that loads the data)
179 decides if you will use shared variables and indices to deal with the 209 decides if you will use shared variables and indices to deal with the
180 dataset or if you will use ``theano.tensor.matrix`` and not the user( at 210 dataset or if you will use ``theano.tensor.matrix`` and not the user( at
181 least not without hacking the code). Of course whoever writes that class 211 least not without hacking the code). Of course whoever writes that class
223 types of Variables that would represent Parameters and Hyper-parameters. 253 types of Variables that would represent Parameters and Hyper-parameters.
224 And as an ending note I would say that there are 254 And as an ending note I would say that there are
225 hyper-parameters for which you need to recompile the thenao function and 255 hyper-parameters for which you need to recompile the thenao function and
226 can not be just parameters ( so we would have yet another category ?). 256 can not be just parameters ( so we would have yet another category ?).
227 257
228 Another syntactic option for iterating over datasets is 258 James: Another syntactic option for iterating over datasets is
229 259
230 .. code-block:: python 260 .. code-block:: python
231 261
232 for sample in dataset.numeric_iterator(batchsize=10): 262 for sample in dataset.numeric_iterator(batchsize=10):
233 do_something_with(sample) 263 do_something_with(sample)
235 The numeric_iterator would create a symbolic batch index, and compile a single function 265 The numeric_iterator would create a symbolic batch index, and compile a single function
236 that extracts the corresponding minibatch. The arguments to the 266 that extracts the corresponding minibatch. The arguments to the
237 numeric_iterator function can also specify what compile mode to use, any givens 267 numeric_iterator function can also specify what compile mode to use, any givens
238 you might want to apply, etc. 268 you might want to apply, etc.
239 269
270 OD comments: Would there also be some kind of function cache to avoid
271 compiling the same function again if we re-iterate on the same dataset with
272 the same arguments? Maybe a more generic issue is: would there be a way for
273 Theano to be more efficient when re-compiling the same function that was
274 already compiled in the same program? (note that I am assuming here it is not
275 efficient, but I may be wrong).
240 276
241 What About Learners? 277 What About Learners?
242 -------------------- 278 --------------------
243 279
244 The discussion above only mentioned datasets, but not learners. The learning 280 The discussion above only mentioned datasets, but not learners. The learning
248 284
249 James asks: 285 James asks:
250 What's wrong with simply passing the variables corresponding to the dataset to 286 What's wrong with simply passing the variables corresponding to the dataset to
251 the constructor of the learner? 287 the constructor of the learner?
252 That seems much more flexible, compact, and clear than the decorator. 288 That seems much more flexible, compact, and clear than the decorator.
289
290 OD replies: Not sure I understand your idea here. We probably want a learner
291 to be able to compute its output on multiple datasets, without having to point
292 to these datasets within the learner itself (which seems cumbersome to me).
293 The point of the decorators is mostly to turn a single function (that outputs
294 a theano variable for the ouptut computed on a single sample) into a function
295 that can compute symbolic datasets as well as numeric sample outputs. Those
296 could also be instead different functions in the base Learner class if the
297 decorator approach is considered ugly / confusing.
253 298
254 A Learner may be able to compute various things. For instance, a Neural 299 A Learner may be able to compute various things. For instance, a Neural
255 Network may output a ``prediction`` vector (whose elements correspond to 300 Network may output a ``prediction`` vector (whose elements correspond to
256 estimated probabilities of each class in a classification task), as well as a 301 estimated probabilities of each class in a classification task), as well as a
257 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone 302 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone
328 for datapoint in results: 373 for datapoint in results:
329 print datapoint.prediction, datapoint.nll, ... 374 print datapoint.prediction, datapoint.nll, ...
330 375
331 Is this close to what you are suggesting? 376 Is this close to what you are suggesting?
332 377
378 OD: Yes, you guessed right, the decorator's role is to do something different
379 depending on the input to the function (see my reply to James above).