Mercurial > pylearn
annotate doc/v2_planning/requirements.txt @ 1093:a65598681620
v2planning - initial commit of use_cases, requirements
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Sun, 12 Sep 2010 21:45:22 -0400 |
parents | |
children | 2bbc294fa5ac |
rev | line source |
---|---|
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
1 ============ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
2 Requirements |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 ============ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
5 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 Application Requirements |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 ======================== |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
8 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
9 Terminology and Abbreviations: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
10 ------------------------------ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 MLA - machine learning algorithm |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 learning problem - a machine learning application typically characterized by a |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 dataset (possibly dataset folds) one or more functions to be learned from the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 data, and one or more metrics to evaluate those functions. Learning problems |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 are the benchmarks for empirical model comparison. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
18 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 n. of - number of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
20 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
21 SGD - stochastic gradient descent |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
22 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
23 Users: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 ------ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
25 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
26 - New masters and PhD students in the lab should be able to quickly move into |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
27 'production' mode without having to reinvent the wheel. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
28 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
29 - Students in the two ML classes, able to play with the library to explore new |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
30 ML variants. This means some APIs (e.g. Experiment level) must be really well |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
31 documented and conceptually simple. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
32 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
33 - Researchers outside the lab (who might study and experiment with our |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
34 algorithms) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
35 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
36 - Partners outside the lab (e.g. Bell, Ubisoft) with closed-source commercial |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
37 projects. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
38 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
39 Uses: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
40 ----- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
41 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
42 R1. reproduce previous work (our own and others') |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
43 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
44 R2. explore MLA variants by swapping components (e.g. optimization algo, dataset, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
45 hyper-parameters). |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
46 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
47 R3. analyze experimental results (e.g. plotting training curves, finding best |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
48 models, marginalizing across hyper-parameter choices) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
49 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
50 R4. disseminate (or serve as platform for disseminating) our own published algorithms |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
51 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
52 R5. provide implementations of common MLA components (e.g. classifiers, datasets, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
53 optimization algorithms, meta-learning algorithms) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
54 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
55 R6. drive large scale parallizable computations (e.g. grid search, bagging, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
56 random search) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
57 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
58 R7. provide implementations of standard pre-processing algorithms (e.g. PCA, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
59 stemming, Mel-scale spectrograms, GIST features, etc.) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
60 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
61 R8. provide high performance suitable for large-scale experiments, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
62 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
63 R9. be able to use the most efficient algorithms in special case combinations of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
64 learning algorithm components (e.g. when there is a fast k-fold validation |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
65 algorithm for a particular model family, the library should not require users |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
66 to rewrite their standard k-fold validation script to use it) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
67 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
68 R10. support experiments on a variety of datasets (e.g. movies, images, text, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
69 sound, reinforcement learning?) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
70 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
71 R11. support efficient computations on datasets larger than RAM and GPU memory |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
72 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
73 R12. support infinite datasets (i.e. generated on the fly) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
74 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
75 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
76 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
77 Basic Design Approach |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
78 ===================== |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
79 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
80 An ability to drive parallel computations is essential in addressing [R6,R8]. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
81 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
82 The basic design approach for the library is to implement |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
83 - a few virtual machines (VMs), some of which can run programs that can be |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
84 parallelized across processors, hosts, and networks. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
85 - MLAs in a Symbolic Expression language (similar to Theano) as required by |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
86 [R5,R7,R8] |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
87 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
88 MLAs are typically specified by Symbolic programs that are compiled to these |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
89 instructions, but some MLAs may be implemented in these instructions directly. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
90 Symbolic programs are naturally modularized by sub-expressions [R2] and can be |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
91 optimized automatically (like in Theano) to address [R9]. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
92 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
93 A VM that caches instruction return values serves as |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
94 - a reliable record of what jobs were run [R1] |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
95 - a database of intermediate results that can be analyzed after the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
96 model-training jobs have completed [R3] |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
97 - a clean API to several possible storage and execution backends. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
98 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
99 |