Mercurial > pylearn
annotate doc/v2_planning/use_cases.txt @ 1419:cff305ad9f60
TensorFnDataset - added x_ attribute that caches the dataset function return
value, but does not get pickled.
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Fri, 04 Feb 2011 16:05:22 -0500 |
parents | 0e12ea6ba661 |
children |
rev | line source |
---|---|
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
1 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
2 Use Cases (Functional Requirements) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 =================================== |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
5 These use cases exhibit pseudo-code for some of the sorts of tasks listed in the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 requirements (requirements.txt) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
8 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
9 Evaluate a classifier on MNIST |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
10 ------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 The evaluation of a classifier on MNIST requires iterating over examples in some |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 set (e.g. validation, test) and comparing the model's prediction with the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 correct answer. The score of the classifier is the number of correct |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 predictions divided by the total number of predictions. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 To perform this calculation, the user should specify: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
18 - the classifier (e.g. a function operating on weights loaded from disk) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 - the dataset (e.g. MNIST) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
20 - the subset of examples on which to evaluate (e.g. test set) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
21 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
22 For example: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
23 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 vm.call(classification_accuracy( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
25 function = classifier, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
26 examples = MNIST.validation_iterator)) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
27 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
28 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
29 The user types very few things beyond the description of the fields necessary |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
30 for the computation, no boilerplate. The `MNIST.validation_iterator` must |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
31 respect a protocol that remains to be worked out. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
32 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
33 The `vm.call` is a compilation & execution step, as opposed to the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
34 symbolic-graph building performed by the `classification_accuracy` call. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
35 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
36 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
37 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
38 Train a linear classifier on MNIST |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
39 ---------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
40 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
41 The training of a linear classifier requires specification of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
42 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
43 - problem dimensions (e.g. n. of inputs, n. of classes) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
44 - parameter initialization method |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
45 - regularization |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
46 - dataset |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
47 - schedule for obtaining training examples (e.g. batch, online, minibatch, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
48 weighted examples) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
49 - algorithm for adapting parameters (e.g. SGD, Conj. Grad) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
50 - a stopping criterion (may be in terms of validation examples) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
51 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
52 Often the dataset determines the problem dimensions. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
53 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
54 Often the training examples and validation examples come from the same set (e.g. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
55 a large matrix of all examples) but this is not necessarily the case. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
56 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
57 There are many ways that the training could be configured, but here is one: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
58 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
59 .. code-block:: python |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
60 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
61 vm.call( |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
62 halflife_stopper( |
1097
8be7928cc1aa
use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
63 # OD: is n_hidden supposed to be n_classes instead? |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
64 initial_model=random_linear_classifier(MNIST.n_inputs, MNIST.n_hidden, r_seed=234432), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
65 burnin=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
66 score_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
67 classification_accuracy( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
68 examples=MNIST.validation_dataset, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
69 function=as_classifier('learner_obj'))), |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
70 |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
71 step_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
72 sgd_step_fn( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
73 parameters = vm_getattr('learner_obj', 'params'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
74 cost_and_updates=classif_nll('learner_obj', |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
75 example_stream=minibatches( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
76 source=MNIST.training_dataset, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
77 batchsize=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
78 loop=True)), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
79 momentum=0.9, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
80 anneal_at_iter=50, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
81 n_iter=100))) #step_fn goes through lots of examples (e.g. an epoch) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
82 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
83 Although I expect this specific code might have to change quite a bit in a final |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
84 version, I want to draw attention to a few aspects of it: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
85 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
86 - we build a symbolic expression graph that contains the whole program, not just |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
87 the learning algorithm |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
88 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
89 - the configuration language allows for callable objects (e.g. functions, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
90 curried functions) to be arguments |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
91 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
92 - there is a lambda function-constructor (vm_lambda) we can use in this language |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
93 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
94 - APIs and protocols are at work in establishing conventions for |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
95 parameter-passing so that sub-expressions (e.g. datasets, optimization |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
96 algorithms, etc.) can be swapped. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
97 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
98 - there are no APIs for things which are not passed as arguments (i.e. the logic |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
99 of the whole program is not exposed via some uber-API). |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
100 |
1106
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
101 OD comments: I didn't have time to look closely at the details, but overall I |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
102 like the general feel of it. At least I'd expect us to need something like |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
103 that to be able to handle the multiple use cases we want to support. I must |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
104 say I'm a bit worried though that it could become scary pretty fast to the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
105 newcomer, with 'lambda functions' and 'virtual machines'. |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
106 Anyway, one point I would like to comment on is the line that creates the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
107 linear classifier. I hope that, as much as possible, we can avoid the need to |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
108 specify dataset dimensions / number of classes in algorithm constructors. I |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
109 regularly had issues in PLearn with the fact we had for instance to give the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
110 number of inputs when creating a neural network. I much prefer when this kind |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
111 of thing can be figured out at runtime: |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
112 |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
113 - Any parameter you can get rid of is a significant gain in |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
114 user-friendliness. |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
115 - It's not always easy to know in advance e.g. the dimension of your input |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
116 dataset. Imagine for instance this dataset is obtained in a first step |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
117 by going through a PCA whose number of output dimensions is set so as to |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
118 keep 90% of the variance. |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
119 - It seems to me it fits better the idea of a symbolic graph: my intuition |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
120 (that may be very different from what you actually have in mind) is to |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
121 see an experiment as a symbolic graph, which you instantiate when you |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
122 provide the input data. One advantage of this point of view is it makes |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
123 it natural to re-use the same block components on various datasets / |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
124 splits, something we often want to do. |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
125 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
126 K-fold cross validation of a classifier |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
127 --------------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
128 |
1189
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
129 .. code-block:: python |
0e12ea6ba661
fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents:
1106
diff
changeset
|
130 |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
131 splits = kfold_cross_validate( |
1097
8be7928cc1aa
use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
132 # OD: What would these parameters mean? |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
133 indexlist = range(1000) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
134 train = 8, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
135 valid = 1, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
136 test = 1, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
137 ) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
138 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
139 trained_models = [ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
140 halflife_early_stopper( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
141 initial_model=alloc_model('param1', 'param2'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
142 burnin=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
143 score_fn = vm_lambda(('learner_obj',), |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
144 classification_error( |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
145 function=as_classifier('learner_obj'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
146 dataset=MNIST.subset(validation_set))), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
147 step_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
148 sgd_step_fn( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
149 parameters = vm_getattr('learner_obj', 'params'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
150 cost_and_updates=classif_nll('learner_obj', |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
151 example_stream=minibatches( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
152 source=MNIST.subset(train_set), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
153 batchsize=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
154 loop=True)), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
155 n_iter=100))) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
156 for (train_set, validation_set, test_set) in splits] |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
157 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
158 vm.call(trained_models, param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
159 vm.call(trained_models, param1=3, param2=4) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
160 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
161 I want to draw attention to the fact that the call method treats the expression |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
162 tree as one big lambda expression, with potentially free variables that must be |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
163 assigned - here the 'param1' and 'param2' arguments to `alloc_model`. There is |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
164 no need to have separate compile and run steps like in Theano because these |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
165 functions are expected to be long-running, and called once. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
166 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
167 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
168 Analyze the results of the K-fold cross validation |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
169 -------------------------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
170 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
171 It often happens that a user doesn't know what statistics to compute *before* |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
172 running a bunch of learning jobs, but only afterward. This can be done by |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
173 extending the symbolic program, and calling the extended function. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
174 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
175 vm.call( |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
176 [pylearn.min(pylearn_getattr(model, 'weights')) for model in trained_models], |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
177 param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
178 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
179 If this is run after the previous calls: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
180 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
181 vm.call(trained_models, param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
182 vm.call(trained_models, param1=3, param2=4) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
183 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
184 Then it should run very quickly, because the `vm` can cache the return values of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
185 the trained_models when param1=1 and param2=2. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
186 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
187 |