# HG changeset patch # User James Bergstra # Date 1281547011 14400 # Node ID 1529c84e460f57c3044d0808ea5c632b74621324 # Parent 0181459b53a129701a0ec29c4be910595e884824# Parent 939806d33183cff2ed91247f5c6277b7784132ae merge diff -r 0181459b53a1 -r 1529c84e460f doc/v2_planning.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning.txt Wed Aug 11 13:16:51 2010 -0400 @@ -0,0 +1,73 @@ + +Motivation +========== + +Yoshua: +------- + +We are missing a *Theano Machine Learning library*. + +The deep learning tutorials do a good job but they lack the following features, which I would like to see in a ML library: + + - a well-organized collection of Theano symbolic expressions (formulas) for handling most of + what is needed either in implementing existing well-known ML and deep learning algorithms or + for creating new variants (without having to start from scratch each time), that is the + mathematical core, + + - a well-organized collection of python modules to help with the following: + - several data-access models that wrap around learning algorithms for interfacing with various types of data (static vectors, images, sound, video, generic time-series, etc.) + - generic utility code for optimization + - stochastic gradient descent variants + - early stopping variants + - interfacing to generic 2nd order optimization methods + - 2nd order methods tailored to work on minibatches + - optimizers for sparse coefficients / parameters + - generic code for model selection and hyper-parameter optimization (including the use and coordination of multiple jobs running on different machines, e.g. using jobman) + - generic code for performance estimation and experimental statistics + - visualization tools (using existing python libraries) and examples for all of the above + - learning algorithm conventions and meta-learning algorithms (bagging, boosting, mixtures of experts, etc.) which use them + + [Note that many of us already use some instance of all the above, but each one tends to reinvent the wheel and newbies don't benefit from a knowledge base.] + + - a well-documented set of python scripts using the above library to show how to run the most + common ML algorithms (possibly with examples showing how to run multiple experiments with + many different models and collect statistical comparative results). This is particularly + important for pure users to adopt Theano in the ML application work. + +Ideally, there would be one person in charge of this project, making sure a coherent and +easy-to-read design is developed, along with many helping hands (to implement the various +helper modules, formulae, and learning algorithms). + + +James: +------- + +I am interested in the design and implementation of the "well-organized collection of Theano +symbolic expressions..." + +I would like to explore algorithms for hyper-parameter optimization, following up on some +"high-throughput" work. I'm most interested in the "generic code for model selection and +hyper-parameter optimization..." and "generic code for performance estimation...". + +I have some experiences with the data-access requirements, and some lessons I'd like to share +on that, but no time to work on that aspect of things. + +I will continue to contribute to the "well-documented set of python scripts using the above to +showcase common ML algorithms...". I have an Olshausen&Field-style sparse coding script that +could be polished up. I am also implementing the mcRBM and I'll be able to add that when it's +done. + + + +Suggestions for how to tackle various desiderata +================================================ + + + +Functional Specifications +========================= + +Put these into different text files so that this one does not become a monster. +For each thing with a functional spec (e.g. datasets library, optimization library) make a +separate file. +