annotate doc/v2_planning/requirements.txt @ 1096:2bbc294fa5ac

requirements: Added a use case
author Olivier Delalleau <delallea@iro>
date Mon, 13 Sep 2010 09:38:26 -0400
parents a65598681620
children 4eda3f52ebef
rev   line source
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1 ============
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Requirements
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 ============
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 Application Requirements
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7 ========================
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9 Terminology and Abbreviations:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10 ------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 MLA - machine learning algorithm
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 learning problem - a machine learning application typically characterized by a
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 dataset (possibly dataset folds) one or more functions to be learned from the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16 data, and one or more metrics to evaluate those functions. Learning problems
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 are the benchmarks for empirical model comparison.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 n. of - number of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21 SGD - stochastic gradient descent
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23 Users:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 ------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 - New masters and PhD students in the lab should be able to quickly move into
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27 'production' mode without having to reinvent the wheel.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29 - Students in the two ML classes, able to play with the library to explore new
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30 ML variants. This means some APIs (e.g. Experiment level) must be really well
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31 documented and conceptually simple.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33 - Researchers outside the lab (who might study and experiment with our
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 algorithms)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36 - Partners outside the lab (e.g. Bell, Ubisoft) with closed-source commercial
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37 projects.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 Uses:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40 -----
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42 R1. reproduce previous work (our own and others')
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44 R2. explore MLA variants by swapping components (e.g. optimization algo, dataset,
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
45 hyper-parameters)
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47 R3. analyze experimental results (e.g. plotting training curves, finding best
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 models, marginalizing across hyper-parameter choices)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50 R4. disseminate (or serve as platform for disseminating) our own published algorithms
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 R5. provide implementations of common MLA components (e.g. classifiers, datasets,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53 optimization algorithms, meta-learning algorithms)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 R6. drive large scale parallizable computations (e.g. grid search, bagging,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56 random search)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58 R7. provide implementations of standard pre-processing algorithms (e.g. PCA,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
59 stemming, Mel-scale spectrograms, GIST features, etc.)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
61 R8. provide high performance suitable for large-scale experiments
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
62
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
63 R9. be able to use the most efficient algorithms in special case combinations of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64 learning algorithm components (e.g. when there is a fast k-fold validation
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65 algorithm for a particular model family, the library should not require users
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66 to rewrite their standard k-fold validation script to use it)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 R10. support experiments on a variety of datasets (e.g. movies, images, text,
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
69 sound, reinforcement learning?)
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
70
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 R11. support efficient computations on datasets larger than RAM and GPU memory
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73 R12. support infinite datasets (i.e. generated on the fly)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
74
1096
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
75 R13. from a given evaluation experimental setup, be able to save a model that
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
76 can be used "in production" (e.g. say you try many combinations of
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
77 preprocessing, models and associated hyper-parameters, and want to easily be
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
78 able to recover the full "processing pipeline" that performs best, to be
2bbc294fa5ac requirements: Added a use case
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
79 used on future "real" test data)
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
80
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
81 Basic Design Approach
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
82 =====================
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
83
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
84 An ability to drive parallel computations is essential in addressing [R6,R8].
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
85
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
86 The basic design approach for the library is to implement
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
87 - a few virtual machines (VMs), some of which can run programs that can be
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
88 parallelized across processors, hosts, and networks.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
89 - MLAs in a Symbolic Expression language (similar to Theano) as required by
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
90 [R5,R7,R8]
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
91
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
92 MLAs are typically specified by Symbolic programs that are compiled to these
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
93 instructions, but some MLAs may be implemented in these instructions directly.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
94 Symbolic programs are naturally modularized by sub-expressions [R2] and can be
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
95 optimized automatically (like in Theano) to address [R9].
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
96
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
97 A VM that caches instruction return values serves as
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
98 - a reliable record of what jobs were run [R1]
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
99 - a database of intermediate results that can be analyzed after the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
100 model-training jobs have completed [R3]
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
101 - a clean API to several possible storage and execution backends.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
102
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
103