annotate doc/v2_planning.txt @ 941:939806d33183

v2_planning.txt
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 11 Aug 2010 08:54:13 -0400
parents
children cafa16bfc7df
rev   line source
941
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Motivation
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 ==========
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5 Yoshua:
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 -------
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8 We are missing a *Theano Machine Learning library*.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10 The deep learning tutorials do a good job but they lack the following features, which I would like to see in a ML library:
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 - a well-organized collection of Theano symbolic expressions (formulas) for handling most of
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13 what is needed either in implementing existing well-known ML and deep learning algorithms or
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 for creating new variants (without having to start from scratch each time), that is the
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 mathematical core,
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 - a well-organized collection of python modules to help with the following:
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18 - several data-access models that wrap around learning algorithms for interfacing with various types of data (static vectors, images, sound, video, generic time-series, etc.)
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 - generic utility code for optimization
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20 - stochastic gradient descent variants
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21 - early stopping variants
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22 - interfacing to generic 2nd order optimization methods
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23 - 2nd order methods tailored to work on minibatches
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 - optimizers for sparse coefficients / parameters
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25 - generic code for model selection and hyper-parameter optimization (including the use and coordination of multiple jobs running on different machines, e.g. using jobman)
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 - generic code for performance estimation and experimental statistics
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27 - visualization tools (using existing python libraries) and examples for all of the above
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28 - learning algorithm conventions and meta-learning algorithms (bagging, boosting, mixtures of experts, etc.) which use them
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30 [Note that many of us already use some instance of all the above, but each one tends to reinvent the wheel and newbies don't benefit from a knowledge base.]
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32 - a well-documented set of python scripts using the above library to show how to run the most
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33 common ML algorithms (possibly with examples showing how to run multiple experiments with
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 many different models and collect statistical comparative results). This is particularly
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35 important for pure users to adopt Theano in the ML application work.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37 Ideally, there would be one person in charge of this project, making sure a coherent and
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38 easy-to-read design is developed, along with many helping hands (to implement the various
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 helper modules, formulae, and learning algorithms).
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42 James:
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43 -------
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
45 I am interested in the design and implementation of the "well-organized collection of Theano
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46 symbolic expressions..."
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 I would like to explore algorithms for hyper-parameter optimization, following up on some
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49 "high-throughput" work. I'm most interested in the "generic code for model selection and
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50 hyper-parameter optimization..." and "generic code for performance estimation...".
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 I have some experiences with the data-access requirements, and some lessons I'd like to share
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53 on that, but no time to work on that aspect of things.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 I will continue to contribute to the "well-documented set of python scripts using the above to
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56 showcase common ML algorithms...". I have an Olshausen&Field-style sparse coding script that
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57 could be polished up. I am also implementing the mcRBM and I'll be able to add that when it's
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58 done.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
59
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
61
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
62 Suggestions for how to tackle various desiderata
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
63 ================================================
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67 Functional Specifications
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 =========================
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
69
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
70 Put these into different text files so that this one does not become a monster.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 For each thing with a functional spec (e.g. datasets library, optimization library) make a
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72 separate file.
939806d33183 v2_planning.txt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73