Mercurial > pylearn
annotate learner.py @ 221:58e17421c69c
tester on iterator consistency now triggers a bug in dataset, linked to the combination of minibatch and slicing
author | Thierry Bertin-Mahieux <bertinmt@iro.umontreal.ca> |
---|---|
date | Fri, 23 May 2008 14:07:53 -0400 |
parents | bd728c83faff |
children | 14b9779622f9 |
rev | line source |
---|---|
211
bd728c83faff
in __get__, problem if the i.stop was None, i being the slice, added one line replacing None by the len(self)
Thierry Bertin-Mahieux <bertinmt@iro.umontreal.ca>
parents:
209
diff
changeset
|
1 |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
2 |
172
fb4837eed1a6
fixed import of AbstractFunction
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
167
diff
changeset
|
3 from exceptions import * |
209
50a8302addaf
template statscollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
193
diff
changeset
|
4 from dataset import AttributesHolder |
180
2698c0feeb54
mlp seems to work!
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
177
diff
changeset
|
5 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
6 class LearningAlgorithm(object): |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
7 """ |
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
8 Base class for learning algorithms, provides an interface |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
9 that allows various algorithms to be applicable to generic learning |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
10 algorithms. It is only given here to define the expected semantics. |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
11 |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
12 A L{Learner} can be seen as a learning algorithm, a function that when |
135
0d8e721cc63c
Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
134
diff
changeset
|
13 applied to training data returns a learned function (which is an object that |
0d8e721cc63c
Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
134
diff
changeset
|
14 can be applied to other data and return some output data). |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
15 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
16 There are two main ways of using a learning algorithms, and some learning |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
17 algorithms only support one of them. The first is the way of the standard |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
18 machine learning framework, in which a learning algorithm is applied |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
19 to a training dataset, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
20 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
21 model = learning_algorithm(training_set) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
22 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
23 resulting in a fully trained model that can be applied to another dataset: |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
24 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
25 output_dataset = model(input_dataset) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
26 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
27 Note that the application of a dataset has no side-effect on the model. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
28 In that example, the training set may for example have 'input' and 'target' |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
29 fields while the input dataset may have only 'input' (or both 'input' and |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
30 'target') and the output dataset would contain some default output fields defined |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
31 by the learning algorithm (e.g. 'output' and 'error'). |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
32 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
33 The second way of using a learning algorithm is in the online or |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
34 adaptive framework, where the training data are only revealed in pieces |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
35 (maybe one example or a batch of example at a time): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
36 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
37 model = learning_algorithm() |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
38 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
39 results in a fresh model. The model can be adapted by presenting |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
40 it with some training data, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
41 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
42 model.update(some_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
43 ... |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
44 model.update(some_more_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
45 ... |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
46 model.update(yet_more_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
47 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
48 and at any point one can use the model to perform some computation: |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
49 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
50 output_dataset = model(input_dataset) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
51 |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
52 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
53 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
54 def __init__(self): pass |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
55 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
56 def __call__(self, training_dataset=None): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
57 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
58 Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise). |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
59 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
60 raise AbstractFunction() |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
61 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
62 class LearnerModel(AttributesHolder): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
63 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
64 LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
65 It is only given here to define the expected semantics. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
66 """ |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
67 def __init__(self): |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
68 pass |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
69 |
14 | 70 def update(self,training_set,train_stats_collector=None): |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
71 """ |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
72 Continue training a learner, with the evidence provided by the given training set. |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
73 Hence update can be called multiple times. This is the main method used for training in the |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
74 on-line setting or the sequential (Bayesian or not) settings. |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
75 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
76 This function has as side effect that self(data) will behave differently, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
77 according to the adaptation achieved by update(). |
75
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
78 |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
79 The user may optionally provide a training L{StatsCollector} that is used to record |
75
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
80 some statistics of the outputs computed during training. It is update(d) during |
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
81 training. |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
82 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
83 raise AbstractFunction() |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
84 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
85 def __call__(self,input_dataset,output_fieldnames=None, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
86 test_stats_collector=None,copy_inputs=False, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
87 put_stats_in_output_dataset=True, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
88 output_attributes=[]): |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
89 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
90 A trained or partially trained L{Model} can be used with |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
91 with one or more calls to it. The argument is an input L{DataSet} (possibly |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
92 containing a single example) and the result is an output L{DataSet} of the same length. |
128 | 93 If output_fieldnames is specified, it may be use to indicate which fields should |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
94 be constructed in the output L{DataSet} (for example ['output','classification_error']). |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
95 Otherwise, some default output fields are produced (possibly depending on the input |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
96 fields available in the input_dataset). |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
14
diff
changeset
|
97 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
98 visible in the output L{DataSet} returned by this method. |
128 | 99 Optionally, attributes of the learner can be copied in the output dataset, |
100 and statistics computed by the stats collector also put in the output dataset. | |
101 Note the distinction between fields (which are example-wise quantities, e.g. 'input') | |
102 and attributes (which are not, e.g. 'regularization_term'). | |
110
8fa1ef2411a0
Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents:
109
diff
changeset
|
103 """ |
8fa1ef2411a0
Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents:
109
diff
changeset
|
104 raise AbstractFunction() |