Mercurial > pylearn
comparison doc/v2_planning/datalearn.txt @ 1359:5db730bb0e8e
comments on datalearn
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Thu, 11 Nov 2010 17:53:13 -0500 |
parents | ffa2932a8cba |
children | 7548dc1b163c |
comparison
equal
deleted
inserted
replaced
1358:8cc66dac6430 | 1359:5db730bb0e8e |
---|---|
44 expressions coded in individual datasets into a single graph. Currently, we | 44 expressions coded in individual datasets into a single graph. Currently, we |
45 instead consider that a dataset has a member that is a Theano variable, and | 45 instead consider that a dataset has a member that is a Theano variable, and |
46 this variable represents the data stored in the dataset. The same is done for | 46 this variable represents the data stored in the dataset. The same is done for |
47 individual data samples. | 47 individual data samples. |
48 | 48 |
49 James asks: Why would a Theano graph in which some nodes represent datasets give | |
50 up the ability to combine Theano expressions coded in individual datasets? | |
51 Firstly, if you want to use Theano expressions and compiled functions to | |
52 implement the perform() method of an Op, you can do that. Secondly, you can | |
53 just include those 'expressions coded in individual datasets' into the overall | |
54 graph. | |
55 | |
49 One issue with this approach is illustrated by the following example. Imagine | 56 One issue with this approach is illustrated by the following example. Imagine |
50 we want to iterate on samples in a dataset and do something with their | 57 we want to iterate on samples in a dataset and do something with their |
51 numeric value. We would want the code to be as close as possible to: | 58 numeric value. We would want the code to be as close as possible to: |
52 | 59 |
53 .. code-block:: python | 60 .. code-block:: python |
98 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar() | 105 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar() |
99 get_sample = theano.function([symbolic_index], | 106 get_sample = theano.function([symbolic_index], |
100 dataset[symbolic_index].variable) | 107 dataset[symbolic_index].variable) |
101 for numeric_index in xrange(len(dataset)) | 108 for numeric_index in xrange(len(dataset)) |
102 do_something_with(get_sample(numeric_index)) | 109 do_something_with(get_sample(numeric_index)) |
110 | |
111 James comments: this is how I have written the last couple of projects, it's | |
112 slightly verbose but it's clear and efficient. | |
103 | 113 |
104 Note that although the above example focused on how to iterate over a dataset, | 114 Note that although the above example focused on how to iterate over a dataset, |
105 it can be cast into a more generic problem, where some data (either dataset or | 115 it can be cast into a more generic problem, where some data (either dataset or |
106 sample) is the result of some transformation applied to other data, which is | 116 sample) is the result of some transformation applied to other data, which is |
107 parameterized by parameters p1, p2, ..., pN (in the above example, we were | 117 parameterized by parameters p1, p2, ..., pN (in the above example, we were |
112 Ideally it would be nice to let the user take control on what is being | 122 Ideally it would be nice to let the user take control on what is being |
113 compiled, while leaving the option of using a default sensible behavior for | 123 compiled, while leaving the option of using a default sensible behavior for |
114 those who do not want to worry about it. How to achieve this is still to be | 124 those who do not want to worry about it. How to achieve this is still to be |
115 determined. | 125 determined. |
116 | 126 |
127 | |
128 Another syntactic option for iterating over datasets is | |
129 | |
130 .. code-block:: python | |
131 | |
132 for sample in dataset.numeric_iterator(batchsize=10): | |
133 do_something_with(sample) | |
134 | |
135 The numeric_iterator would create a symbolic batch index, and compile a single function | |
136 that extracts the corresponding minibatch. The arguments to the | |
137 numeric_iterator function can also specify what compile mode to use, any givens | |
138 you might want to apply, etc. | |
139 | |
140 | |
117 What About Learners? | 141 What About Learners? |
118 -------------------- | 142 -------------------- |
119 | 143 |
120 The discussion above only mentioned datasets, but not learners. The learning | 144 The discussion above only mentioned datasets, but not learners. The learning |
121 part of a learner is not a main concern (currently). What matters most w.r.t. | 145 part of a learner is not a main concern (currently). What matters most w.r.t. |
122 what was discussed above is how a learner takes as input a dataset and outputs | 146 what was discussed above is how a learner takes as input a dataset and outputs |
123 another dataset that can be used with the dataset API. | 147 another dataset that can be used with the dataset API. |
148 | |
149 James asks: | |
150 What's wrong with simply passing the variables corresponding to the dataset to | |
151 the constructor of the learner? | |
152 That seems much more flexible, compact, and clear than the decorator. | |
124 | 153 |
125 A Learner may be able to compute various things. For instance, a Neural | 154 A Learner may be able to compute various things. For instance, a Neural |
126 Network may output a ``prediction`` vector (whose elements correspond to | 155 Network may output a ``prediction`` vector (whose elements correspond to |
127 estimated probabilities of each class in a classification task), as well as a | 156 estimated probabilities of each class in a classification task), as well as a |
128 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone | 157 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone |