annotate doc/v2_planning/use_cases.txt @ 1419:cff305ad9f60

TensorFnDataset - added x_ attribute that caches the dataset function return value, but does not get pickled.
author James Bergstra <bergstrj@iro.umontreal.ca>
date Fri, 04 Feb 2011 16:05:22 -0500
parents 0e12ea6ba661
children
rev   line source
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Use Cases (Functional Requirements)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 ===================================
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5 These use cases exhibit pseudo-code for some of the sorts of tasks listed in the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 requirements (requirements.txt)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9 Evaluate a classifier on MNIST
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10 -------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 The evaluation of a classifier on MNIST requires iterating over examples in some
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13 set (e.g. validation, test) and comparing the model's prediction with the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 correct answer. The score of the classifier is the number of correct
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 predictions divided by the total number of predictions.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 To perform this calculation, the user should specify:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18 - the classifier (e.g. a function operating on weights loaded from disk)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 - the dataset (e.g. MNIST)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20 - the subset of examples on which to evaluate (e.g. test set)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22 For example:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 vm.call(classification_accuracy(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25 function = classifier,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 examples = MNIST.validation_iterator))
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29 The user types very few things beyond the description of the fields necessary
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30 for the computation, no boilerplate. The `MNIST.validation_iterator` must
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31 respect a protocol that remains to be worked out.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33 The `vm.call` is a compilation & execution step, as opposed to the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 symbolic-graph building performed by the `classification_accuracy` call.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38 Train a linear classifier on MNIST
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 ----------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41 The training of a linear classifier requires specification of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43 - problem dimensions (e.g. n. of inputs, n. of classes)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44 - parameter initialization method
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
45 - regularization
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46 - dataset
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47 - schedule for obtaining training examples (e.g. batch, online, minibatch,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 weighted examples)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49 - algorithm for adapting parameters (e.g. SGD, Conj. Grad)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50 - a stopping criterion (may be in terms of validation examples)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 Often the dataset determines the problem dimensions.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54 Often the training examples and validation examples come from the same set (e.g.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 a large matrix of all examples) but this is not necessarily the case.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57 There are many ways that the training could be configured, but here is one:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
59 .. code-block:: python
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
61 vm.call(
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
62 halflife_stopper(
1097
8be7928cc1aa use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
63 # OD: is n_hidden supposed to be n_classes instead?
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64 initial_model=random_linear_classifier(MNIST.n_inputs, MNIST.n_hidden, r_seed=234432),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65 burnin=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66 score_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67 classification_accuracy(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 examples=MNIST.validation_dataset,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
69 function=as_classifier('learner_obj'))),
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
70
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 step_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72 sgd_step_fn(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73 parameters = vm_getattr('learner_obj', 'params'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
74 cost_and_updates=classif_nll('learner_obj',
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
75 example_stream=minibatches(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
76 source=MNIST.training_dataset,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
77 batchsize=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
78 loop=True)),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
79 momentum=0.9,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
80 anneal_at_iter=50,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
81 n_iter=100))) #step_fn goes through lots of examples (e.g. an epoch)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
82
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
83 Although I expect this specific code might have to change quite a bit in a final
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
84 version, I want to draw attention to a few aspects of it:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
85
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
86 - we build a symbolic expression graph that contains the whole program, not just
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
87 the learning algorithm
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
88
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
89 - the configuration language allows for callable objects (e.g. functions,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
90 curried functions) to be arguments
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
91
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
92 - there is a lambda function-constructor (vm_lambda) we can use in this language
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
93
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
94 - APIs and protocols are at work in establishing conventions for
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
95 parameter-passing so that sub-expressions (e.g. datasets, optimization
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
96 algorithms, etc.) can be swapped.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
97
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
98 - there are no APIs for things which are not passed as arguments (i.e. the logic
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
99 of the whole program is not exposed via some uber-API).
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
100
1106
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
101 OD comments: I didn't have time to look closely at the details, but overall I
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
102 like the general feel of it. At least I'd expect us to need something like
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
103 that to be able to handle the multiple use cases we want to support. I must
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
104 say I'm a bit worried though that it could become scary pretty fast to the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
105 newcomer, with 'lambda functions' and 'virtual machines'.
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
106 Anyway, one point I would like to comment on is the line that creates the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
107 linear classifier. I hope that, as much as possible, we can avoid the need to
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
108 specify dataset dimensions / number of classes in algorithm constructors. I
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
109 regularly had issues in PLearn with the fact we had for instance to give the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
110 number of inputs when creating a neural network. I much prefer when this kind
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
111 of thing can be figured out at runtime:
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
112
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
113 - Any parameter you can get rid of is a significant gain in
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
114 user-friendliness.
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
115 - It's not always easy to know in advance e.g. the dimension of your input
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
116 dataset. Imagine for instance this dataset is obtained in a first step
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
117 by going through a PCA whose number of output dimensions is set so as to
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
118 keep 90% of the variance.
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
119 - It seems to me it fits better the idea of a symbolic graph: my intuition
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
120 (that may be very different from what you actually have in mind) is to
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
121 see an experiment as a symbolic graph, which you instantiate when you
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
122 provide the input data. One advantage of this point of view is it makes
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
123 it natural to re-use the same block components on various datasets /
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
124 splits, something we often want to do.
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
125
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
126 K-fold cross validation of a classifier
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
127 ---------------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
128
1189
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
129 .. code-block:: python
0e12ea6ba661 fix many rst syntax error warning.
Frederic Bastien <nouiz@nouiz.org>
parents: 1106
diff changeset
130
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
131 splits = kfold_cross_validate(
1097
8be7928cc1aa use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
132 # OD: What would these parameters mean?
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
133 indexlist = range(1000)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
134 train = 8,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
135 valid = 1,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
136 test = 1,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
137 )
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
138
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
139 trained_models = [
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
140 halflife_early_stopper(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
141 initial_model=alloc_model('param1', 'param2'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
142 burnin=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
143 score_fn = vm_lambda(('learner_obj',),
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
144 classification_error(
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
145 function=as_classifier('learner_obj'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
146 dataset=MNIST.subset(validation_set))),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
147 step_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
148 sgd_step_fn(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
149 parameters = vm_getattr('learner_obj', 'params'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
150 cost_and_updates=classif_nll('learner_obj',
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
151 example_stream=minibatches(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
152 source=MNIST.subset(train_set),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
153 batchsize=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
154 loop=True)),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
155 n_iter=100)))
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
156 for (train_set, validation_set, test_set) in splits]
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
157
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
158 vm.call(trained_models, param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
159 vm.call(trained_models, param1=3, param2=4)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
160
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
161 I want to draw attention to the fact that the call method treats the expression
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
162 tree as one big lambda expression, with potentially free variables that must be
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
163 assigned - here the 'param1' and 'param2' arguments to `alloc_model`. There is
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
164 no need to have separate compile and run steps like in Theano because these
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
165 functions are expected to be long-running, and called once.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
166
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
167
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
168 Analyze the results of the K-fold cross validation
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
169 --------------------------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
170
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
171 It often happens that a user doesn't know what statistics to compute *before*
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
172 running a bunch of learning jobs, but only afterward. This can be done by
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
173 extending the symbolic program, and calling the extended function.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
174
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
175 vm.call(
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
176 [pylearn.min(pylearn_getattr(model, 'weights')) for model in trained_models],
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
177 param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
178
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
179 If this is run after the previous calls:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
180
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
181 vm.call(trained_models, param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
182 vm.call(trained_models, param1=3, param2=4)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
183
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
184 Then it should run very quickly, because the `vm` can cache the return values of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
185 the trained_models when param1=1 and param2=2.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
186
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
187