pylearn: doc/v2_planning/requirements.txt comparison

comparison doc/v2_planning/requirements.txt @ 1093:a65598681620

v2planning - initial commit of use_cases, requirements

author	James Bergstra <bergstrj@iro.umontreal.ca>
date	Sun, 12 Sep 2010 21:45:22 -0400
parents
children	2bbc294fa5ac

comparison

equal deleted inserted replaced

-:aab9c261361c
+:a65598681620
+============
+Requirements
+============
+Application Requirements
+========================
+Terminology and Abbreviations:
+------------------------------
+MLA - machine learning algorithm
+learning problem - a machine learning application typically characterized by a
+dataset (possibly dataset folds) one or more functions to be learned from the
+data, and one or more metrics to evaluate those functions.  Learning problems
+are the benchmarks for empirical model comparison.
+n. of - number of
+SGD - stochastic gradient descent
+Users:
+------
+- New masters and PhD students in the lab should be able to quickly move into
+'production' mode without having to reinvent the wheel.
+- Students in the two ML classes, able to play with the library to explore new
+ML variants. This means some APIs (e.g. Experiment level) must be really well
+documented and conceptually simple.
+- Researchers outside the lab (who might study and experiment with our
+algorithms)
+- Partners outside the lab (e.g. Bell, Ubisoft) with closed-source commercial
+projects.
+Uses:
+-----
+R1. reproduce previous work (our own and others')
+R2. explore MLA variants by swapping components (e.g.  optimization algo, dataset,
+hyper-parameters).
+R3. analyze experimental results (e.g. plotting training curves, finding best
+models, marginalizing across hyper-parameter choices)
+R4. disseminate (or serve as platform for disseminating) our own published algorithms
+R5. provide implementations of common MLA components (e.g. classifiers, datasets,
+optimization algorithms, meta-learning algorithms)
+R6. drive large scale parallizable computations (e.g. grid search, bagging,
+random search)
+R7. provide implementations of standard pre-processing algorithms (e.g. PCA,
+stemming, Mel-scale spectrograms, GIST features, etc.)
+R8. provide high performance suitable for large-scale experiments,
+R9. be able to use the most efficient algorithms in special case combinations of
+learning algorithm components (e.g. when there is a fast k-fold validation
+algorithm for a particular model family, the library should not require users
+to rewrite their standard k-fold validation script to use it)
+R10. support experiments on a variety of datasets (e.g. movies, images, text,
+sound, reinforcement learning?)
+R11. support efficient computations on datasets larger than RAM and GPU memory
+R12. support infinite datasets (i.e. generated on the fly)
+Basic Design Approach
+=====================
+An ability to drive parallel computations is essential in addressing [R6,R8].
+The basic design approach for the library is to implement
+- a few virtual machines (VMs), some of which can run programs that can be
+parallelized across processors, hosts, and networks.
+- MLAs in a Symbolic Expression language (similar to Theano) as required by
+[R5,R7,R8]
+MLAs are typically specified by Symbolic programs that are compiled to these
+instructions, but some MLAs may be implemented in these instructions directly.
+Symbolic programs are naturally modularized by sub-expressions [R2] and can be
+optimized automatically (like in Theano) to address [R9].
+A VM that caches instruction return values serves as
+- a reliable record of what jobs were run [R1]
+- a database of intermediate results that can be analyzed after the
+model-training jobs have completed [R3]
+- a clean API to several possible storage and execution backends.

Mercurial > pylearn

comparison doc/v2_planning/requirements.txt @ 1093:a65598681620