annotate doc/v2_planning/use_cases.txt @ 1117:c1943feada10

Proposal for theano dataset wrapper. The details still have to be worked out.
author Arnaud Bergeron <abergeron@gmail.com>
date Tue, 14 Sep 2010 15:22:48 -0400
parents 21d25bed2ce9
children 0e12ea6ba661
rev   line source
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Use Cases (Functional Requirements)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 ===================================
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5 These use cases exhibit pseudo-code for some of the sorts of tasks listed in the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 requirements (requirements.txt)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
8
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
9 Evaluate a classifier on MNIST
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
10 -------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 The evaluation of a classifier on MNIST requires iterating over examples in some
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13 set (e.g. validation, test) and comparing the model's prediction with the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14 correct answer. The score of the classifier is the number of correct
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 predictions divided by the total number of predictions.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 To perform this calculation, the user should specify:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
18 - the classifier (e.g. a function operating on weights loaded from disk)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 - the dataset (e.g. MNIST)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
20 - the subset of examples on which to evaluate (e.g. test set)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
21
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
22 For example:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
23
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24 vm.call(classification_accuracy(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
25 function = classifier,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26 examples = MNIST.validation_iterator))
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
27
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
28
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
29 The user types very few things beyond the description of the fields necessary
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30 for the computation, no boilerplate. The `MNIST.validation_iterator` must
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
31 respect a protocol that remains to be worked out.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
32
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
33 The `vm.call` is a compilation & execution step, as opposed to the
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
34 symbolic-graph building performed by the `classification_accuracy` call.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
35
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
36
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
37
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
38 Train a linear classifier on MNIST
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
39 ----------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
40
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
41 The training of a linear classifier requires specification of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
42
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
43 - problem dimensions (e.g. n. of inputs, n. of classes)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
44 - parameter initialization method
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
45 - regularization
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
46 - dataset
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
47 - schedule for obtaining training examples (e.g. batch, online, minibatch,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
48 weighted examples)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
49 - algorithm for adapting parameters (e.g. SGD, Conj. Grad)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
50 - a stopping criterion (may be in terms of validation examples)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
51
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
52 Often the dataset determines the problem dimensions.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
53
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
54 Often the training examples and validation examples come from the same set (e.g.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
55 a large matrix of all examples) but this is not necessarily the case.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
56
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
57 There are many ways that the training could be configured, but here is one:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
58
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
59
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
60 vm.call(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
61 halflife_stopper(
1097
8be7928cc1aa use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
62 # OD: is n_hidden supposed to be n_classes instead?
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
63 initial_model=random_linear_classifier(MNIST.n_inputs, MNIST.n_hidden, r_seed=234432),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
64 burnin=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
65 score_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
66 classification_accuracy(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
67 examples=MNIST.validation_dataset,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 function=as_classifier('learner_obj'))),
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
69
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
70 step_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
71 sgd_step_fn(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
72 parameters = vm_getattr('learner_obj', 'params'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
73 cost_and_updates=classif_nll('learner_obj',
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
74 example_stream=minibatches(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
75 source=MNIST.training_dataset,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
76 batchsize=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
77 loop=True)),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
78 momentum=0.9,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
79 anneal_at_iter=50,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
80 n_iter=100))) #step_fn goes through lots of examples (e.g. an epoch)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
81
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
82 Although I expect this specific code might have to change quite a bit in a final
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
83 version, I want to draw attention to a few aspects of it:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
84
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
85 - we build a symbolic expression graph that contains the whole program, not just
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
86 the learning algorithm
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
87
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
88 - the configuration language allows for callable objects (e.g. functions,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
89 curried functions) to be arguments
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
90
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
91 - there is a lambda function-constructor (vm_lambda) we can use in this language
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
92
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
93 - APIs and protocols are at work in establishing conventions for
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
94 parameter-passing so that sub-expressions (e.g. datasets, optimization
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
95 algorithms, etc.) can be swapped.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
96
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
97 - there are no APIs for things which are not passed as arguments (i.e. the logic
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
98 of the whole program is not exposed via some uber-API).
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
99
1106
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
100 OD comments: I didn't have time to look closely at the details, but overall I
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
101 like the general feel of it. At least I'd expect us to need something like
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
102 that to be able to handle the multiple use cases we want to support. I must
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
103 say I'm a bit worried though that it could become scary pretty fast to the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
104 newcomer, with 'lambda functions' and 'virtual machines'.
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
105 Anyway, one point I would like to comment on is the line that creates the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
106 linear classifier. I hope that, as much as possible, we can avoid the need to
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
107 specify dataset dimensions / number of classes in algorithm constructors. I
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
108 regularly had issues in PLearn with the fact we had for instance to give the
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
109 number of inputs when creating a neural network. I much prefer when this kind
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
110 of thing can be figured out at runtime:
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
111 - Any parameter you can get rid of is a significant gain in
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
112 user-friendliness.
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
113 - It's not always easy to know in advance e.g. the dimension of your input
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
114 dataset. Imagine for instance this dataset is obtained in a first step
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
115 by going through a PCA whose number of output dimensions is set so as to
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
116 keep 90% of the variance.
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
117 - It seems to me it fits better the idea of a symbolic graph: my intuition
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
118 (that may be very different from what you actually have in mind) is to
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
119 see an experiment as a symbolic graph, which you instantiate when you
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
120 provide the input data. One advantage of this point of view is it makes
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
121 it natural to re-use the same block components on various datasets /
21d25bed2ce9 use_cases: Comment about using predefined dataset dimensions
Olivier Delalleau <delallea@iro>
parents: 1101
diff changeset
122 splits, something we often want to do.
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
123
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
124 K-fold cross validation of a classifier
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
125 ---------------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
126
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
127 splits = kfold_cross_validate(
1097
8be7928cc1aa use_cases: Added couple questions
Olivier Delalleau <delallea@iro>
parents: 1093
diff changeset
128 # OD: What would these parameters mean?
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
129 indexlist = range(1000)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
130 train = 8,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
131 valid = 1,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
132 test = 1,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
133 )
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
134
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
135 trained_models = [
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
136 halflife_early_stopper(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
137 initial_model=alloc_model('param1', 'param2'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
138 burnin=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
139 score_fn = vm_lambda(('learner_obj',),
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
140 classification_error(
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
141 function=as_classifier('learner_obj'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
142 dataset=MNIST.subset(validation_set))),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
143 step_fn = vm_lambda(('learner_obj',),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
144 sgd_step_fn(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
145 parameters = vm_getattr('learner_obj', 'params'),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
146 cost_and_updates=classif_nll('learner_obj',
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
147 example_stream=minibatches(
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
148 source=MNIST.subset(train_set),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
149 batchsize=100,
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
150 loop=True)),
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
151 n_iter=100)))
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
152 for (train_set, validation_set, test_set) in splits]
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
153
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
154 vm.call(trained_models, param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
155 vm.call(trained_models, param1=3, param2=4)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
156
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
157 I want to draw attention to the fact that the call method treats the expression
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
158 tree as one big lambda expression, with potentially free variables that must be
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
159 assigned - here the 'param1' and 'param2' arguments to `alloc_model`. There is
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
160 no need to have separate compile and run steps like in Theano because these
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
161 functions are expected to be long-running, and called once.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
162
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
163
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
164 Analyze the results of the K-fold cross validation
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
165 --------------------------------------------------
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
166
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
167 It often happens that a user doesn't know what statistics to compute *before*
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
168 running a bunch of learning jobs, but only afterward. This can be done by
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
169 extending the symbolic program, and calling the extended function.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
170
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
171 vm.call(
1101
b422cbaddc52 v2planning - minor edits to use_cases
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1097
diff changeset
172 [pylearn.min(pylearn_getattr(model, 'weights')) for model in trained_models],
1093
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
173 param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
174
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
175 If this is run after the previous calls:
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
176
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
177 vm.call(trained_models, param1=1, param2=2)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
178 vm.call(trained_models, param1=3, param2=4)
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
179
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
180 Then it should run very quickly, because the `vm` can cache the return values of
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
181 the trained_models when param1=1 and param2=2.
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
182
a65598681620 v2planning - initial commit of use_cases, requirements
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
183