annotate doc/v2_planning/requirements.txt @ 1187:7d34edde029d

added serializability requiremnt
author James Bergstra <bergstrj@iro.umontreal.ca>
date Fri, 17 Sep 2010 17:53:35 -0400
parents 1f5465622394
children ab80ba052d32
rev   line source
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1 ============
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Requirements
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 ============
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 Application Requirements
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7 ========================
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9 Terminology and Abbreviations:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10 ------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 MLA - machine learning algorithm
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 learning problem - a machine learning application typically characterized by a
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 dataset (possibly dataset folds) one or more functions to be learned from the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16 data, and one or more metrics to evaluate those functions. Learning problems
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 are the benchmarks for empirical model comparison.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 n. of - number of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21 SGD - stochastic gradient descent
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23 Users:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 ------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 - New masters and PhD students in the lab should be able to quickly move into
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27 'production' mode without having to reinvent the wheel.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29 - Students in the two ML classes, able to play with the library to explore new
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30 ML variants. This means some APIs (e.g. Experiment level) must be really well
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31 documented and conceptually simple.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33 - Researchers outside the lab (who might study and experiment with our
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 algorithms)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36 - Partners outside the lab (e.g. Bell, Ubisoft) with closed-source commercial
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37 projects.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 Uses:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40 -----
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42 R1. reproduce previous work (our own and others')
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44 R2. explore MLA variants by swapping components (e.g. optimization algo, dataset,
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
45 hyper-parameters)
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47 R3. analyze experimental results (e.g. plotting training curves, finding best
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 models, marginalizing across hyper-parameter choices)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50 R4. disseminate (or serve as platform for disseminating) our own published algorithms
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 R5. provide implementations of common MLA components (e.g. classifiers, datasets,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53 optimization algorithms, meta-learning algorithms)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 R6. drive large scale parallizable computations (e.g. grid search, bagging,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56 random search)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58 R7. provide implementations of standard pre-processing algorithms (e.g. PCA,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
59 stemming, Mel-scale spectrograms, GIST features, etc.)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
61 R8. provide high performance suitable for large-scale experiments
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
62
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
63 R9. be able to use the most efficient algorithms in special case combinations of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64 learning algorithm components (e.g. when there is a fast k-fold validation
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65 algorithm for a particular model family, the library should not require users
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66 to rewrite their standard k-fold validation script to use it)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 R10. support experiments on a variety of datasets (e.g. movies, images, text,
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
69 sound, reinforcement learning?)
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
70
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 R11. support efficient computations on datasets larger than RAM and GPU memory
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73 R12. support infinite datasets (i.e. generated on the fly)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
74
1098
4eda3f52ebef v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1096
diff changeset
75 R13. apply trained models "in production".
4eda3f52ebef v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1096
diff changeset
76 - e.g. say you try many combinations of preprocessing, models and associated
4eda3f52ebef v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1096
diff changeset
77 hyper-parameters, and want to easily be able to recover the full "processing
4eda3f52ebef v2planning - revs to requirements, added architecture
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1096
diff changeset
78 pipeline" that performs best, and use it on real/test data later.
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
79
1121
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
80 OD comments: Note that R9 and R13 may conflict with each other. Some
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
81 optimizations performed by R9 may modify the input "symbolic graph" in such a
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
82 way that extracting the required components for "production purpose" (R13)
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
83 could be made more difficult (or even impossible). Imagine for instance that
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
84 the graph is modified to take advantage of the fact that k-fold validation can
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
85 be performed efficiently internally by some specific algorithm. Then it may
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
86 not be obvious anymore how to remove the k-fold split in the saved model you
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
87 want to use in production.
1f5465622394 requirements: Added comment about potentially conflicting requirements
Olivier Delalleau <delallea@iro>
parents: 1098
diff changeset
88
1187
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
89
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
90 Requirements for component architecture
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
91 =======================================
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
92
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
93
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
94 R14. Serializability of experiments. (essentially in pursuit of R6)
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
95
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
96 Jobs that are running a learning algorithm with our components (datasets,
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
97 models, algorithms) must be able to serialize the experiment's state to a string
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
98 (typically written to disk) and be able to restart it from such a string. There
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
99 must be a mechanism to tell a job to serialize the experiment as soon as
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
100 possible, and a latency of up to 10 seconds should be acceptable. It must also
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
101 be possible to deserialize the experiment for introspection (inspect the state
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
102 of individual components), not just for continuing the experiment. The
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
103 experiment can assume that resources on disk that were present when the
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
104 experiment started will be present when the experiment resumes. The experiment
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
105 cannot assume that resources written by the experiment will still be there (e.g.
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
106 in /tmp or cwd). Implementations should make an effort to make the serialized
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
107 representation compact, when it is possible to recompute or reload from disk
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
108 at deserialization time.
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
109
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
110 This requirement is aimed at enabling process migration and job control as well
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
111 as post-hoc analysis of experiment results.
7d34edde029d added serializability requiremnt
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1121
diff changeset
112