comparison doc/v2_planning.txt @ 946:7c4504a4ce1a

additions to formulas, data access, hyper-params, scripts
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Wed, 11 Aug 2010 21:32:31 -0400
parents cafa16bfc7df
children 216f4ce969b2
comparison
equal deleted inserted replaced
945:cafa16bfc7df 946:7c4504a4ce1a
66 Theano Symbolic Expressions for ML 66 Theano Symbolic Expressions for ML
67 ---------------------------------- 67 ----------------------------------
68 68
69 We could make this a submodule of pylearn: ``pylearn.nnet``. 69 We could make this a submodule of pylearn: ``pylearn.nnet``.
70 70
71 Yoshua: I would use a different name, e.g., "pylearn.formulas" to emphasize that it is not just
72 about neural nets, and that this is a collection of formulas (expressions), rather than
73 completely self-contained classes for learners. We could have a "nnet.py" file for
74 neural nets, though.
75
71 There are a number of ideas floating around for how to handle classes / 76 There are a number of ideas floating around for how to handle classes /
72 modules (LeDeepNet, pylearn.shared.layers, pynnet) so lets implement as much 77 modules (LeDeepNet, pylearn.shared.layers, pynnet) so lets implement as much
73 math as possible in global functions with no classes. There are no models in 78 math as possible in global functions with no classes. There are no models in
74 the wish list that require than a few vectors and matrices to parametrize. 79 the wish list that require than a few vectors and matrices to parametrize.
75 Global functions are more reusable than classes. 80 Global functions are more reusable than classes.
83 to example (whose type and nature depends on the dataset, it could for 88 to example (whose type and nature depends on the dataset, it could for
84 instance be an (image, label) pair). This interface permits iterating over 89 instance be an (image, label) pair). This interface permits iterating over
85 the dataset, shuffling the dataset, and splitting it into folds. For 90 the dataset, shuffling the dataset, and splitting it into folds. For
86 efficiency, it is nice if the dataset interface supports looking up several 91 efficiency, it is nice if the dataset interface supports looking up several
87 index values at once, because looking up many examples at once can sometimes 92 index values at once, because looking up many examples at once can sometimes
88 be faster than looking each one up in turn. 93 be faster than looking each one up in turn. In particular, looking up
94 a consecutive block of indices, or a slice, should be well supported.
89 95
90 Some datasets may not support random access (e.g. a random number stream) and 96 Some datasets may not support random access (e.g. a random number stream) and
91 that's fine if an exception is raised. The user will see a NotImplementedError 97 that's fine if an exception is raised. The user will see a NotImplementedError
92 or similar, and try something else. 98 or similar, and try something else. We might want to have a way to test
99 that a dataset is random-access or not without having to load an example.
93 100
94 101
95 A more intuitive interface for many datasets (or subsets) is to load them as 102 A more intuitive interface for many datasets (or subsets) is to load them as
96 matrices or lists of examples. This format is more convenient to work with at 103 matrices or lists of examples. This format is more convenient to work with at
97 an ipython shell, for example. It is not good to provide only the "dataset 104 an ipython shell, for example. It is not good to provide only the "dataset
115 defined implicitly by the contents of /data/lisa/data at DIRO, but it would be 122 defined implicitly by the contents of /data/lisa/data at DIRO, but it would be
116 better to document in pylearn what the contents of this folder should be as 123 better to document in pylearn what the contents of this folder should be as
117 much as possible. It should be possible to rebuild this tree from information 124 much as possible. It should be possible to rebuild this tree from information
118 found in pylearn. 125 found in pylearn.
119 126
127 Yoshua (about ideas proposed by Pascal Vincent a while ago):
128
129 - we may want to distinguish between datasets and tasks: a task defines
130 not just the data but also things like what is the input and what is the
131 target (for supervised learning), and *importantly* a set of performance metrics
132 that make sense for this task (e.g. those used by papers solving a particular
133 task, or reported for a particular benchmark)
134
135 - we should discuss about a few "standards" that datasets and tasks may comply to, such as
136 - "input" and "target" fields inside each example, for supervised or semi-supervised learning tasks
137 (with a convention for the semi-supervised case when only the input or only the target is observed)
138 - "input" for unsupervised learning
139 - conventions for missing-valued components inside input or target
140 - how examples that are sequences are treated (e.g. the input or the target is a sequence)
141 - how time-stamps are specified when appropriate (e.g., the sequences are asynchronous)
142 - how error metrics are specified
143 * example-level statistics (e.g. classification error)
144 * dataset-level statistics (e.g. ROC curve, mean and standard error of error)
120 145
121 146
122 Model Selection & Hyper-Parameter Optimization 147 Model Selection & Hyper-Parameter Optimization
123 ---------------------------------------------- 148 ----------------------------------------------
124 149
129 the experiment to run and the hyper-parameter space to search. Then this 154 the experiment to run and the hyper-parameter space to search. Then this
130 application-driver would take control of scheduling jobs and running them on 155 application-driver would take control of scheduling jobs and running them on
131 various computers... I'm imagining a potentially ugly brute of a hack that's 156 various computers... I'm imagining a potentially ugly brute of a hack that's
132 not necessarily something we will want to expose at a low-level for reuse. 157 not necessarily something we will want to expose at a low-level for reuse.
133 158
159 Yoshua: We want both the library-defined driver that takes instructions about how to generate
160 new hyper-parameter combinations (e.g. implicitly providing a prior distribution from which
161 to sample them), and examples showing how to use it in typical cases.
162 Note that sometimes we just want to find the best configuration of hyper-parameters,
163 but sometimes we want to do more subtle analysis. Often a combination of both.
164 In this respect it could be useful for the user to define hyper-parameters over
165 which scientific questions are sought (e.g. depth of an architecture) vs
166 hyper-parameters that we would like to marginalize/maximize over (e.g. learning rate).
167 This can influence both the sampling of configurations (we want to make sure that all
168 combinations of question-driving hyper-parameters are covered) and the analysis
169 of results (we may be willing to estimate ANOVAs or averaging or quantiles over
170 the non-question-driving hyper-parameters).
134 171
135 Python scripts for common ML algorithms 172 Python scripts for common ML algorithms
136 --------------------------------------- 173 ---------------------------------------
137 174
138 The script aspect of this feature request makes me think that what would be 175 The script aspect of this feature request makes me think that what would be
139 good here is more tutorial-type scripts. And the existing tutorials could 176 good here is more tutorial-type scripts. And the existing tutorials could
140 potentially be rewritten to use some of the pylearn.nnet expressions. More 177 potentially be rewritten to use some of the pylearn.nnet expressions. More
141 tutorials / demos would be great. 178 tutorials / demos would be great.
142 179
180 Yoshua: agreed that we could write them as tutorials, but note how the
181 spirit would be different from the current deep learning tutorials: we would
182 not mind using library code as much as possible instead of trying to flatten
183 out everything in the interest of pedagogical simplicity. Instead, these
184 tutorials should be meant to illustrate not the algorithms but *how to take
185 advantage of the library*. They could also be used as *BLACK BOX* implementations
186 by people who don't want to dig lower and just want to run experiments.
143 187
144 Functional Specifications 188 Functional Specifications
145 ========================= 189 =========================
146 190
147 TODO: 191 TODO:
149 For each thing with a functional spec (e.g. datasets library, optimization library) make a 193 For each thing with a functional spec (e.g. datasets library, optimization library) make a
150 separate file. 194 separate file.
151 195
152 196
153 197
154 pylearn.nnet 198 pylearn.formulas
155 ------------ 199 ----------------
156 200
157 Submodule with functions for building layers, calculating classification 201 Directory with functions for building layers, calculating classification
158 errors, cross-entropies with various distributions, free energies. This 202 errors, cross-entropies with various distributions, free energies, etc. This
159 module would include for the most part global functions, Theano Ops and Theano 203 module would include for the most part global functions, Theano Ops and Theano
160 optimizations. 204 optimizations.
205
206 Yoshua: I would break it down in module files, e.g.:
207
208 pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error,
209 abs. error, various sparsity penalties (L1, Student)
210
211 pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA
212
213 pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
214 layers which could be plugged with various costs & penalties, and stacked
215
216 pylearn.formulas.ae: formulas for auto-encoders, denoising auto-encoder variants, and corruption processes
217
218 pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling
219
220 pylearn.formulas.trees: formulas for decision trees
221
222 pylearn.formulas.boosting: formulas for boosting variants
223
224 etc.
161 225
162 Indexing Convention 226 Indexing Convention
163 ~~~~~~~~~~~~~~~~~~~ 227 ~~~~~~~~~~~~~~~~~~~
164 228
165 Something to decide on - Fortran-style or C-style indexing. Although we have 229 Something to decide on - Fortran-style or C-style indexing. Although we have