Mercurial > pylearn
annotate doc/v2_planning/requirements.txt @ 1211:e7ac87720fee
v2planning plugin_JB - added PRINT and POPEN to demonstrate parallel async. control flows
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 22 Sep 2010 00:23:07 -0400 |
parents | 5525cf3faaa2 |
children | 31b72defb680 |
rev | line source |
---|---|
1192
ab80ba052d32
refactored the index page of the v2_planning stuff.
Frederic Bastien <nouiz@nouiz.org>
parents:
1187
diff
changeset
|
1 .. _requirements: |
ab80ba052d32
refactored the index page of the v2_planning stuff.
Frederic Bastien <nouiz@nouiz.org>
parents:
1187
diff
changeset
|
2 |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 ============ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 Requirements |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
5 ============ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
8 Application Requirements |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
9 ======================== |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 Terminology and Abbreviations: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 ------------------------------ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 MLA - machine learning algorithm |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 learning problem - a machine learning application typically characterized by a |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 dataset (possibly dataset folds) one or more functions to be learned from the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
18 data, and one or more metrics to evaluate those functions. Learning problems |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 are the benchmarks for empirical model comparison. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
20 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
21 n. of - number of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
22 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
23 SGD - stochastic gradient descent |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
25 Users: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
26 ------ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
27 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
28 - New masters and PhD students in the lab should be able to quickly move into |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
29 'production' mode without having to reinvent the wheel. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
30 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
31 - Students in the two ML classes, able to play with the library to explore new |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
32 ML variants. This means some APIs (e.g. Experiment level) must be really well |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
33 documented and conceptually simple. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
34 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
35 - Researchers outside the lab (who might study and experiment with our |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
36 algorithms) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
37 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
38 - Partners outside the lab (e.g. Bell, Ubisoft) with closed-source commercial |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
39 projects. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
40 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
41 Uses: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
42 ----- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
43 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
44 R1. reproduce previous work (our own and others') |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
45 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
46 R2. explore MLA variants by swapping components (e.g. optimization algo, dataset, |
1096
2bbc294fa5ac
requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
47 hyper-parameters) |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
48 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
49 R3. analyze experimental results (e.g. plotting training curves, finding best |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
50 models, marginalizing across hyper-parameter choices) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
51 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
52 R4. disseminate (or serve as platform for disseminating) our own published algorithms |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
53 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
54 R5. provide implementations of common MLA components (e.g. classifiers, datasets, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
55 optimization algorithms, meta-learning algorithms) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
56 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
57 R6. drive large scale parallizable computations (e.g. grid search, bagging, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
58 random search) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
59 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
60 R7. provide implementations of standard pre-processing algorithms (e.g. PCA, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
61 stemming, Mel-scale spectrograms, GIST features, etc.) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
62 |
1096
2bbc294fa5ac
requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
63 R8. provide high performance suitable for large-scale experiments |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
64 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
65 R9. be able to use the most efficient algorithms in special case combinations of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
66 learning algorithm components (e.g. when there is a fast k-fold validation |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
67 algorithm for a particular model family, the library should not require users |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
68 to rewrite their standard k-fold validation script to use it) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
69 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
70 R10. support experiments on a variety of datasets (e.g. movies, images, text, |
1096
2bbc294fa5ac
requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
71 sound, reinforcement learning?) |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
72 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
73 R11. support efficient computations on datasets larger than RAM and GPU memory |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
74 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
75 R12. support infinite datasets (i.e. generated on the fly) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
76 |
1098
4eda3f52ebef
v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1096
diff
changeset
|
77 R13. apply trained models "in production". |
4eda3f52ebef
v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1096
diff
changeset
|
78 - e.g. say you try many combinations of preprocessing, models and associated |
4eda3f52ebef
v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1096
diff
changeset
|
79 hyper-parameters, and want to easily be able to recover the full "processing |
4eda3f52ebef
v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1096
diff
changeset
|
80 pipeline" that performs best, and use it on real/test data later. |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
81 |
1121
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
82 OD comments: Note that R9 and R13 may conflict with each other. Some |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
83 optimizations performed by R9 may modify the input "symbolic graph" in such a |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
84 way that extracting the required components for "production purpose" (R13) |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
85 could be made more difficult (or even impossible). Imagine for instance that |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
86 the graph is modified to take advantage of the fact that k-fold validation can |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
87 be performed efficiently internally by some specific algorithm. Then it may |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
88 not be obvious anymore how to remove the k-fold split in the saved model you |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
89 want to use in production. |
1f5465622394
requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents:
1098
diff
changeset
|
90 |
1187
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
91 |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
92 Requirements for component architecture |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
93 ======================================= |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
94 |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
95 |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
96 R14. Serializability of experiments. (essentially in pursuit of R6) |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
97 |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
98 Jobs that are running a learning algorithm with our components (datasets, |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
99 models, algorithms) must be able to serialize the experiment's state to a string |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
100 (typically written to disk) and be able to restart it from such a string. There |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
101 must be a mechanism to tell a job to serialize the experiment as soon as |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
102 possible, and a latency of up to 10 seconds should be acceptable. It must also |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
103 be possible to deserialize the experiment for introspection (inspect the state |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
104 of individual components), not just for continuing the experiment. The |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
105 experiment can assume that resources on disk that were present when the |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
106 experiment started will be present when the experiment resumes. The experiment |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
107 cannot assume that resources written by the experiment will still be there (e.g. |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
108 in /tmp or cwd). Implementations should make an effort to make the serialized |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
109 representation compact, when it is possible to recompute or reload from disk |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
110 at deserialization time. |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
111 |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
112 This requirement is aimed at enabling process migration and job control as well |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
113 as post-hoc analysis of experiment results. |
7d34edde029d
added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1121
diff
changeset
|
114 |
1205
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
115 OD asks: When you say "The experiment cannot assume that resources written by |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
116 the experiment will still be there", do you mean we should be able to recover |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
117 the exact same output after interrupting an experiment, wiping its expdir, and |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
118 restarting it? This would mean that any output saved on disk by the experiment |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
119 also has to be serialized within the experiment, which may lead to very big |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
120 serialization files (and possibly memory issues?) |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
121 A less constraining interpretation of your statement (which I like better) is |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
122 that we allow "previous" output to be lost: we only ask that the experiment |
5525cf3faaa2
requirements: Question about the serialization requirement
Olivier Delalleau <delallea@iro>
parents:
1192
diff
changeset
|
123 should be able to produce the "new" outputs after a wipe+restart. |