comparison doc/v2_planning/datalearn.txt @ 1361:7548dc1b163c

Some question/suggestions to datalearn
author Razvan Pascanu <r.pascanu@gmail.com>
date Thu, 11 Nov 2010 22:40:01 -0500
parents 5db730bb0e8e
children 6b9673d72a41
comparison
equal deleted inserted replaced
1360:f81b3b6f9698 1361:7548dc1b163c
51 Firstly, if you want to use Theano expressions and compiled functions to 51 Firstly, if you want to use Theano expressions and compiled functions to
52 implement the perform() method of an Op, you can do that. Secondly, you can 52 implement the perform() method of an Op, you can do that. Secondly, you can
53 just include those 'expressions coded in individual datasets' into the overall 53 just include those 'expressions coded in individual datasets' into the overall
54 graph. 54 graph.
55 55
56 Razvan comments: 1) Having Theano expressions inside the perform of a Theano
57 Op can lead to issues. I know I had to deal with a few when implementing
58 Scan which does exactly this. Well to be fair these issues mostly come into
59 play when the inner graph has to interact with the outer graph and most of
60 the time they can be solved. I guess all that I'm saying is going that way
61 might lead to some head-ache to developers, though I guess some head-ache
62 will be involved no matter what
63 2) In my view (I'm not sure this is what Olivier was saying) the idea of
64 not putting the Dataset into a Variable is to not put the logic related to
65 loading data, dividing it into slices when running it on the GPU and so on
66 into a theano variable. In my view this logic goes into a DataSet class
67 that gives you shared variables, symbolic indices into that shared
68 variables, and also numeric indices. When looping through those numeric
69 indices, the dataset class can reload parts of the data into the
70 shared variable and so on.
71
72
73
56 One issue with this approach is illustrated by the following example. Imagine 74 One issue with this approach is illustrated by the following example. Imagine
57 we want to iterate on samples in a dataset and do something with their 75 we want to iterate on samples in a dataset and do something with their
58 numeric value. We would want the code to be as close as possible to: 76 numeric value. We would want the code to be as close as possible to:
59 77
60 .. code-block:: python 78 .. code-block:: python
108 for numeric_index in xrange(len(dataset)) 126 for numeric_index in xrange(len(dataset))
109 do_something_with(get_sample(numeric_index)) 127 do_something_with(get_sample(numeric_index))
110 128
111 James comments: this is how I have written the last couple of projects, it's 129 James comments: this is how I have written the last couple of projects, it's
112 slightly verbose but it's clear and efficient. 130 slightly verbose but it's clear and efficient.
131
132 <Razvan comments>: I assume that ``do_something_with`` is suppose to be some
133 numeric function, and dataset in this case is the result of some
134 computations on a initial dataset.
135 I would differentiate the two approaches (1) and (2) as :
136 - first of all whatever you can do with (1) you can do with (2)
137 - approach (1) hides the fact that you are working with symbolic graphs.
138 You apply functions to datasets, and when you want to see values a
139 function is compiled under the hood and those values are computed for
140 you. In approach (2) the fact that you deal with a symbolic graph is
141 explicit because you have to manually compile your functions.
142 - approach (1) needs to use this function_storage trick shared between
143 certain nodes of the graph to reduce the number of compilation while in
144 approach (2) we don't need to deal with the complexity of lazy
145 compilation
146 - approach (1) needs a replace function if you want to change the dataset.
147 What you would do, is once you have a "computational graph" or pipeline
148 or whatever you call it, say ``graph``, to change the input you would do
149 graph.replace({ init_data_X: new_data_X}), In approach (2) the init_data_X
150 and new_data_X is the ``dataset`` so you would compile two different
151 functions. Well I would re-write (2) -- to make the above more clear --
152 as :
153
154 .. code-block:: python
155
156 symbolic_index = theano.tensor.iscalar()
157 get_sample1 = theano.function( [symbolic_index],
158 graph( dataset[symbolic_index] ).variable)
159 for numeric_index in xrange(len(dataset)):
160 do_something_with(get_sample(numeric_index))
161
162 get_sample2 = theano.function( [symbolic_index],
163 graph( new_dataset[symbolic_index] ).variable)
164 ## Note: the dataset was replaced with new_dataset
165 for numeric_index in xrange(len(new_dataset)):
166 do_something_with(get_sample2(numeric_index))
167
168 ######### FOR (1) you write:
169
170 for datapoint in graph:
171 do_something_with( datapoint() )
172
173 new_graph = graph.replace({dataset:dataset2})
174
175 for datapoint in new_graph:
176 do_something_with(datapoint())
177
178 - in approach (1) the initial dataset object (the one that loads the data)
179 decides if you will use shared variables and indices to deal with the
180 dataset or if you will use ``theano.tensor.matrix`` and not the user( at
181 least not without hacking the code). Of course whoever writes that class
182 can add a flag to it to switch between behaviours that make sense.
183 In approach (2) one is not forced to do this
184 inside that class by construction, though by convention I would do it.
185 So if you consider the one who writes that class as a developer than
186 in (2) the user can decide/deal with this and not the developer.
187 Though this is a fine-line -- I would say the user would actually
188 write that class as well using some template.
189 That is to say (2) looks and feels more like working with Theano
190 directly,
191
192 Bottom line, I think (1) puts more stress on the development of the library,
193 and hides Theano and some of the complexity for day to day usage.
194 In (2) everything is a bit more explicit, leaving the impression that you
195 have more control over the code, though I strongly feel that whatever can
196 be done in (2) can be done in (1). Traditionally I was more inclined
197 towards (1) but now I'm not that sure, I think both are equally interesting
198 and valid options.
199 </Razvan comments>
113 200
114 Note that although the above example focused on how to iterate over a dataset, 201 Note that although the above example focused on how to iterate over a dataset,
115 it can be cast into a more generic problem, where some data (either dataset or 202 it can be cast into a more generic problem, where some data (either dataset or
116 sample) is the result of some transformation applied to other data, which is 203 sample) is the result of some transformation applied to other data, which is
117 parameterized by parameters p1, p2, ..., pN (in the above example, we were 204 parameterized by parameters p1, p2, ..., pN (in the above example, we were
122 Ideally it would be nice to let the user take control on what is being 209 Ideally it would be nice to let the user take control on what is being
123 compiled, while leaving the option of using a default sensible behavior for 210 compiled, while leaving the option of using a default sensible behavior for
124 those who do not want to worry about it. How to achieve this is still to be 211 those who do not want to worry about it. How to achieve this is still to be
125 determined. 212 determined.
126 213
214 Razvan Comment: I thought about this a bit at the Pylearn level. In my
215 original train of thought you would have the distinction between ``hand
216 picked parameters`` which I would call hyper-parameter and learned
217 parameters. A transformation in this framework (an op if you wish) could
218 take as inputs DataSet(s), DataField(s), Parameter(s) (which are the things
219 that the learner should adapt) and HyperParameter(s). All hyper-parameters
220 will turn into arguments of the compiled function (like the indices of each
221 of the dataset objects ) and therefore they can be changed without
222 re-compilation. Or in other words this can be easily done by having new
223 types of Variables that would represent Parameters and Hyper-parameters.
224 And as an ending note I would say that there are
225 hyper-parameters for which you need to recompile the thenao function and
226 can not be just parameters ( so we would have yet another category ?).
127 227
128 Another syntactic option for iterating over datasets is 228 Another syntactic option for iterating over datasets is
129 229
130 .. code-block:: python 230 .. code-block:: python
131 231
210 In the code above, if one wants to obtain the numeric value of an element of 310 In the code above, if one wants to obtain the numeric value of an element of
211 ``multiple_fields_dataset``, the Theano function being compiled would be able 311 ``multiple_fields_dataset``, the Theano function being compiled would be able
212 to optimize computations so that the simultaneous computation of 312 to optimize computations so that the simultaneous computation of
213 ``prediction`` and ``cost`` is done efficiently. 313 ``prediction`` and ``cost`` is done efficiently.
214 314
315 Razvan asks: What is predict_sample for ? What is predict_dataset? What I
316 guess you mean is that the decorator is used to convert a function that
317 takes a theano variable and outputs a theano variable into a class/function
318 that takes a DataField/DataSet and outputs a DataField/DataSet. It could
319 also register all those different functions, so that the Dataset that
320 you get out of (not one of the function) the entire Learner (this Dataset
321 is returned by __call__) would contain all those as fields.
322 I would use it like this:
323
324 .. code-block:: python
325
326 nnet = NeuralNetwork()
327 results = nnet(dataset)
328 for datapoint in results:
329 print datapoint.prediction, datapoint.nll, ...
330
331 Is this close to what you are suggesting?
332