comparison doc/v2_planning.txt @ 944:1529c84e460f

merge
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 11 Aug 2010 13:16:51 -0400
parents 939806d33183
children cafa16bfc7df
comparison
equal deleted inserted replaced
943:0181459b53a1 944:1529c84e460f
1
2 Motivation
3 ==========
4
5 Yoshua:
6 -------
7
8 We are missing a *Theano Machine Learning library*.
9
10 The deep learning tutorials do a good job but they lack the following features, which I would like to see in a ML library:
11
12 - a well-organized collection of Theano symbolic expressions (formulas) for handling most of
13 what is needed either in implementing existing well-known ML and deep learning algorithms or
14 for creating new variants (without having to start from scratch each time), that is the
15 mathematical core,
16
17 - a well-organized collection of python modules to help with the following:
18 - several data-access models that wrap around learning algorithms for interfacing with various types of data (static vectors, images, sound, video, generic time-series, etc.)
19 - generic utility code for optimization
20 - stochastic gradient descent variants
21 - early stopping variants
22 - interfacing to generic 2nd order optimization methods
23 - 2nd order methods tailored to work on minibatches
24 - optimizers for sparse coefficients / parameters
25 - generic code for model selection and hyper-parameter optimization (including the use and coordination of multiple jobs running on different machines, e.g. using jobman)
26 - generic code for performance estimation and experimental statistics
27 - visualization tools (using existing python libraries) and examples for all of the above
28 - learning algorithm conventions and meta-learning algorithms (bagging, boosting, mixtures of experts, etc.) which use them
29
30 [Note that many of us already use some instance of all the above, but each one tends to reinvent the wheel and newbies don't benefit from a knowledge base.]
31
32 - a well-documented set of python scripts using the above library to show how to run the most
33 common ML algorithms (possibly with examples showing how to run multiple experiments with
34 many different models and collect statistical comparative results). This is particularly
35 important for pure users to adopt Theano in the ML application work.
36
37 Ideally, there would be one person in charge of this project, making sure a coherent and
38 easy-to-read design is developed, along with many helping hands (to implement the various
39 helper modules, formulae, and learning algorithms).
40
41
42 James:
43 -------
44
45 I am interested in the design and implementation of the "well-organized collection of Theano
46 symbolic expressions..."
47
48 I would like to explore algorithms for hyper-parameter optimization, following up on some
49 "high-throughput" work. I'm most interested in the "generic code for model selection and
50 hyper-parameter optimization..." and "generic code for performance estimation...".
51
52 I have some experiences with the data-access requirements, and some lessons I'd like to share
53 on that, but no time to work on that aspect of things.
54
55 I will continue to contribute to the "well-documented set of python scripts using the above to
56 showcase common ML algorithms...". I have an Olshausen&Field-style sparse coding script that
57 could be polished up. I am also implementing the mcRBM and I'll be able to add that when it's
58 done.
59
60
61
62 Suggestions for how to tackle various desiderata
63 ================================================
64
65
66
67 Functional Specifications
68 =========================
69
70 Put these into different text files so that this one does not become a monster.
71 For each thing with a functional spec (e.g. datasets library, optimization library) make a
72 separate file.
73