changeset 941:939806d33183

v2_planning.txt
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 11 Aug 2010 08:54:13 -0400
parents a75bf0aca18f
children 1529c84e460f
files doc/v2_planning.txt
diffstat 1 files changed, 73 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning.txt	Wed Aug 11 08:54:13 2010 -0400
@@ -0,0 +1,73 @@
+
+Motivation
+==========
+
+Yoshua:
+-------
+
+We are missing a *Theano Machine Learning library*.
+
+The deep learning tutorials do a good job but they lack the following features, which I would like to see in a ML library:
+
+ - a well-organized collection of Theano symbolic expressions (formulas) for handling most of
+   what is needed either in implementing existing well-known ML and deep learning algorithms or
+   for creating new variants (without having to start from scratch each time), that is the
+   mathematical core,
+
+ - a well-organized collection of python modules to help with the following:
+      - several data-access models that wrap around learning algorithms for interfacing with various types of data (static vectors, images, sound, video, generic time-series, etc.)
+      - generic utility code for optimization
+             - stochastic gradient descent variants
+             - early stopping variants
+             - interfacing to generic 2nd order optimization methods
+             - 2nd order methods tailored to work on minibatches
+             - optimizers for sparse coefficients / parameters
+     - generic code for model selection and hyper-parameter optimization (including the use and coordination of multiple jobs running on different machines, e.g. using jobman)
+     - generic code for performance estimation and experimental statistics
+     - visualization tools (using existing python libraries) and examples for all of the above
+     - learning algorithm conventions and meta-learning algorithms (bagging, boosting, mixtures of experts, etc.) which use them
+
+   [Note that many of us already use some instance of all the above, but each one tends to reinvent the wheel and newbies don't benefit from a knowledge base.]
+
+ - a well-documented set of python scripts using the above library to show how to run the most
+   common ML algorithms (possibly with examples showing how to run multiple experiments with
+   many different models and collect statistical comparative results). This is particularly
+   important for pure users to adopt Theano in the ML application work.
+
+Ideally, there would be one person in charge of this project, making sure a coherent and
+easy-to-read design is developed, along with many helping hands (to implement the various
+helper modules, formulae, and learning algorithms).
+
+
+James:
+-------
+
+I am interested in the design and implementation of the "well-organized collection of Theano
+symbolic expressions..."
+
+I would like to explore algorithms for hyper-parameter optimization, following up on some
+"high-throughput" work.  I'm most interested in the "generic code for model selection and
+hyper-parameter optimization..." and "generic code for performance estimation...".  
+
+I have some experiences with the data-access requirements, and some lessons I'd like to share
+on that, but no time to work on that aspect of things.
+
+I will continue to contribute to the "well-documented set of python scripts using the above to
+showcase common ML algorithms...".  I have an Olshausen&Field-style sparse coding script that
+could be polished up.  I am also implementing the mcRBM and I'll be able to add that when it's
+done.
+
+
+
+Suggestions for how to tackle various desiderata
+================================================
+
+
+
+Functional Specifications
+=========================
+
+Put these into different text files so that this one does not become a monster.
+For each thing with a functional spec (e.g. datasets library, optimization library) make a
+separate file.
+