Mercurial > pylearn
view doc/v2_planning.txt @ 944:1529c84e460f
merge
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 11 Aug 2010 13:16:51 -0400 |
parents | 939806d33183 |
children | cafa16bfc7df |
line wrap: on
line source
Motivation ========== Yoshua: ------- We are missing a *Theano Machine Learning library*. The deep learning tutorials do a good job but they lack the following features, which I would like to see in a ML library: - a well-organized collection of Theano symbolic expressions (formulas) for handling most of what is needed either in implementing existing well-known ML and deep learning algorithms or for creating new variants (without having to start from scratch each time), that is the mathematical core, - a well-organized collection of python modules to help with the following: - several data-access models that wrap around learning algorithms for interfacing with various types of data (static vectors, images, sound, video, generic time-series, etc.) - generic utility code for optimization - stochastic gradient descent variants - early stopping variants - interfacing to generic 2nd order optimization methods - 2nd order methods tailored to work on minibatches - optimizers for sparse coefficients / parameters - generic code for model selection and hyper-parameter optimization (including the use and coordination of multiple jobs running on different machines, e.g. using jobman) - generic code for performance estimation and experimental statistics - visualization tools (using existing python libraries) and examples for all of the above - learning algorithm conventions and meta-learning algorithms (bagging, boosting, mixtures of experts, etc.) which use them [Note that many of us already use some instance of all the above, but each one tends to reinvent the wheel and newbies don't benefit from a knowledge base.] - a well-documented set of python scripts using the above library to show how to run the most common ML algorithms (possibly with examples showing how to run multiple experiments with many different models and collect statistical comparative results). This is particularly important for pure users to adopt Theano in the ML application work. Ideally, there would be one person in charge of this project, making sure a coherent and easy-to-read design is developed, along with many helping hands (to implement the various helper modules, formulae, and learning algorithms). James: ------- I am interested in the design and implementation of the "well-organized collection of Theano symbolic expressions..." I would like to explore algorithms for hyper-parameter optimization, following up on some "high-throughput" work. I'm most interested in the "generic code for model selection and hyper-parameter optimization..." and "generic code for performance estimation...". I have some experiences with the data-access requirements, and some lessons I'd like to share on that, but no time to work on that aspect of things. I will continue to contribute to the "well-documented set of python scripts using the above to showcase common ML algorithms...". I have an Olshausen&Field-style sparse coding script that could be polished up. I am also implementing the mcRBM and I'll be able to add that when it's done. Suggestions for how to tackle various desiderata ================================================ Functional Specifications ========================= Put these into different text files so that this one does not become a monster. For each thing with a functional spec (e.g. datasets library, optimization library) make a separate file.