Mercurial > pylearn
annotate doc/v2_planning/use_cases.txt @ 1135:a1957faecc9b
revised plugin interface and implementation
author | Olivier Breuleux <breuleuo@iro.umontreal.ca> |
---|---|
date | Thu, 16 Sep 2010 02:58:24 -0400 |
parents | 21d25bed2ce9 |
children | 0e12ea6ba661 |
rev | line source |
---|---|
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
1 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
2 Use Cases (Functional Requirements) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 =================================== |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
5 These use cases exhibit pseudo-code for some of the sorts of tasks listed in the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 requirements (requirements.txt) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
8 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
9 Evaluate a classifier on MNIST |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
10 ------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 The evaluation of a classifier on MNIST requires iterating over examples in some |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 set (e.g. validation, test) and comparing the model's prediction with the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 correct answer. The score of the classifier is the number of correct |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 predictions divided by the total number of predictions. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 To perform this calculation, the user should specify: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
18 - the classifier (e.g. a function operating on weights loaded from disk) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 - the dataset (e.g. MNIST) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
20 - the subset of examples on which to evaluate (e.g. test set) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
21 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
22 For example: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
23 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 vm.call(classification_accuracy( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
25 function = classifier, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
26 examples = MNIST.validation_iterator)) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
27 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
28 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
29 The user types very few things beyond the description of the fields necessary |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
30 for the computation, no boilerplate. The `MNIST.validation_iterator` must |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
31 respect a protocol that remains to be worked out. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
32 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
33 The `vm.call` is a compilation & execution step, as opposed to the |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
34 symbolic-graph building performed by the `classification_accuracy` call. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
35 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
36 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
37 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
38 Train a linear classifier on MNIST |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
39 ---------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
40 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
41 The training of a linear classifier requires specification of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
42 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
43 - problem dimensions (e.g. n. of inputs, n. of classes) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
44 - parameter initialization method |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
45 - regularization |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
46 - dataset |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
47 - schedule for obtaining training examples (e.g. batch, online, minibatch, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
48 weighted examples) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
49 - algorithm for adapting parameters (e.g. SGD, Conj. Grad) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
50 - a stopping criterion (may be in terms of validation examples) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
51 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
52 Often the dataset determines the problem dimensions. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
53 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
54 Often the training examples and validation examples come from the same set (e.g. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
55 a large matrix of all examples) but this is not necessarily the case. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
56 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
57 There are many ways that the training could be configured, but here is one: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
58 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
59 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
60 vm.call( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
61 halflife_stopper( |
1097
8be7928cc1aa
use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
62 # OD: is n_hidden supposed to be n_classes instead? |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
63 initial_model=random_linear_classifier(MNIST.n_inputs, MNIST.n_hidden, r_seed=234432), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
64 burnin=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
65 score_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
66 classification_accuracy( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
67 examples=MNIST.validation_dataset, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
68 function=as_classifier('learner_obj'))), |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
69 |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
70 step_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
71 sgd_step_fn( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
72 parameters = vm_getattr('learner_obj', 'params'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
73 cost_and_updates=classif_nll('learner_obj', |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
74 example_stream=minibatches( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
75 source=MNIST.training_dataset, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
76 batchsize=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
77 loop=True)), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
78 momentum=0.9, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
79 anneal_at_iter=50, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
80 n_iter=100))) #step_fn goes through lots of examples (e.g. an epoch) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
81 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
82 Although I expect this specific code might have to change quite a bit in a final |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
83 version, I want to draw attention to a few aspects of it: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
84 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
85 - we build a symbolic expression graph that contains the whole program, not just |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
86 the learning algorithm |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
87 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
88 - the configuration language allows for callable objects (e.g. functions, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
89 curried functions) to be arguments |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
90 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
91 - there is a lambda function-constructor (vm_lambda) we can use in this language |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
92 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
93 - APIs and protocols are at work in establishing conventions for |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
94 parameter-passing so that sub-expressions (e.g. datasets, optimization |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
95 algorithms, etc.) can be swapped. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
96 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
97 - there are no APIs for things which are not passed as arguments (i.e. the logic |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
98 of the whole program is not exposed via some uber-API). |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
99 |
1106
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
100 OD comments: I didn't have time to look closely at the details, but overall I |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
101 like the general feel of it. At least I'd expect us to need something like |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
102 that to be able to handle the multiple use cases we want to support. I must |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
103 say I'm a bit worried though that it could become scary pretty fast to the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
104 newcomer, with 'lambda functions' and 'virtual machines'. |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
105 Anyway, one point I would like to comment on is the line that creates the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
106 linear classifier. I hope that, as much as possible, we can avoid the need to |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
107 specify dataset dimensions / number of classes in algorithm constructors. I |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
108 regularly had issues in PLearn with the fact we had for instance to give the |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
109 number of inputs when creating a neural network. I much prefer when this kind |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
110 of thing can be figured out at runtime: |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
111 - Any parameter you can get rid of is a significant gain in |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
112 user-friendliness. |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
113 - It's not always easy to know in advance e.g. the dimension of your input |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
114 dataset. Imagine for instance this dataset is obtained in a first step |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
115 by going through a PCA whose number of output dimensions is set so as to |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
116 keep 90% of the variance. |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
117 - It seems to me it fits better the idea of a symbolic graph: my intuition |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
118 (that may be very different from what you actually have in mind) is to |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
119 see an experiment as a symbolic graph, which you instantiate when you |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
120 provide the input data. One advantage of this point of view is it makes |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
121 it natural to re-use the same block components on various datasets / |
21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents:
1101
diff
changeset
|
122 splits, something we often want to do. |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
123 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
124 K-fold cross validation of a classifier |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
125 --------------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
126 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
127 splits = kfold_cross_validate( |
1097
8be7928cc1aa
use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents:
1093
diff
changeset
|
128 # OD: What would these parameters mean? |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
129 indexlist = range(1000) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
130 train = 8, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
131 valid = 1, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
132 test = 1, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
133 ) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
134 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
135 trained_models = [ |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
136 halflife_early_stopper( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
137 initial_model=alloc_model('param1', 'param2'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
138 burnin=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
139 score_fn = vm_lambda(('learner_obj',), |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
140 classification_error( |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
141 function=as_classifier('learner_obj'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
142 dataset=MNIST.subset(validation_set))), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
143 step_fn = vm_lambda(('learner_obj',), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
144 sgd_step_fn( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
145 parameters = vm_getattr('learner_obj', 'params'), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
146 cost_and_updates=classif_nll('learner_obj', |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
147 example_stream=minibatches( |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
148 source=MNIST.subset(train_set), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
149 batchsize=100, |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
150 loop=True)), |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
151 n_iter=100))) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
152 for (train_set, validation_set, test_set) in splits] |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
153 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
154 vm.call(trained_models, param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
155 vm.call(trained_models, param1=3, param2=4) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
156 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
157 I want to draw attention to the fact that the call method treats the expression |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
158 tree as one big lambda expression, with potentially free variables that must be |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
159 assigned - here the 'param1' and 'param2' arguments to `alloc_model`. There is |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
160 no need to have separate compile and run steps like in Theano because these |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
161 functions are expected to be long-running, and called once. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
162 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
163 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
164 Analyze the results of the K-fold cross validation |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
165 -------------------------------------------------- |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
166 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
167 It often happens that a user doesn't know what statistics to compute *before* |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
168 running a bunch of learning jobs, but only afterward. This can be done by |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
169 extending the symbolic program, and calling the extended function. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
170 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
171 vm.call( |
1101
b422cbaddc52
v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1097
diff
changeset
|
172 [pylearn.min(pylearn_getattr(model, 'weights')) for model in trained_models], |
1093
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
173 param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
174 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
175 If this is run after the previous calls: |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
176 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
177 vm.call(trained_models, param1=1, param2=2) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
178 vm.call(trained_models, param1=3, param2=4) |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
179 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
180 Then it should run very quickly, because the `vm` can cache the return values of |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
181 the trained_models when param1=1 and param2=2. |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
182 |
a65598681620
v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
183 |