Mercurial > pylearn
annotate learner.py @ 229:d7250ee86f72
Added speed test for ArraDataSet
author | Frederic Bastien <bastienf@iro.umontreal.ca> |
---|---|
date | Fri, 16 May 2008 16:40:26 -0400 |
parents | cb6b945acf5a |
children | 50a8302addaf |
rev | line source |
---|---|
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
1 |
172
fb4837eed1a6
fixed import of AbstractFunction
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
167
diff
changeset
|
2 from exceptions import * |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
3 |
180
2698c0feeb54
mlp seems to work!
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
177
diff
changeset
|
4 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
5 class LearningAlgorithm(object): |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
6 """ |
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
7 Base class for learning algorithms, provides an interface |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
8 that allows various algorithms to be applicable to generic learning |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
9 algorithms. It is only given here to define the expected semantics. |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
10 |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
11 A L{Learner} can be seen as a learning algorithm, a function that when |
135
0d8e721cc63c
Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
134
diff
changeset
|
12 applied to training data returns a learned function (which is an object that |
0d8e721cc63c
Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
134
diff
changeset
|
13 can be applied to other data and return some output data). |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
14 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
15 There are two main ways of using a learning algorithms, and some learning |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
16 algorithms only support one of them. The first is the way of the standard |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
17 machine learning framework, in which a learning algorithm is applied |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
18 to a training dataset, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
19 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
20 model = learning_algorithm(training_set) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
21 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
22 resulting in a fully trained model that can be applied to another dataset: |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
23 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
24 output_dataset = model(input_dataset) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
25 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
26 Note that the application of a dataset has no side-effect on the model. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
27 In that example, the training set may for example have 'input' and 'target' |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
28 fields while the input dataset may have only 'input' (or both 'input' and |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
29 'target') and the output dataset would contain some default output fields defined |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
30 by the learning algorithm (e.g. 'output' and 'error'). |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
31 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
32 The second way of using a learning algorithm is in the online or |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
33 adaptive framework, where the training data are only revealed in pieces |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
34 (maybe one example or a batch of example at a time): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
35 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
36 model = learning_algorithm() |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
37 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
38 results in a fresh model. The model can be adapted by presenting |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
39 it with some training data, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
40 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
41 model.update(some_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
42 ... |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
43 model.update(some_more_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
44 ... |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
45 model.update(yet_more_training_data) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
46 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
47 and at any point one can use the model to perform some computation: |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
48 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
49 output_dataset = model(input_dataset) |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
50 |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
51 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
52 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
53 def __init__(self): pass |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
54 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
55 def __call__(self, training_dataset=None): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
56 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
57 Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise). |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
58 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
59 raise AbstractFunction() |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
60 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
61 class LearnerModel(AttributesHolder): |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
62 """ |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
63 LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
64 It is only given here to define the expected semantics. |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
65 """ |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
66 def __init__(self): |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
67 pass |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
68 |
14 | 69 def update(self,training_set,train_stats_collector=None): |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
70 """ |
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
71 Continue training a learner, with the evidence provided by the given training set. |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
72 Hence update can be called multiple times. This is the main method used for training in the |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
73 on-line setting or the sequential (Bayesian or not) settings. |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
74 |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
75 This function has as side effect that self(data) will behave differently, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
76 according to the adaptation achieved by update(). |
75
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
77 |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
78 The user may optionally provide a training L{StatsCollector} that is used to record |
75
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
79 some statistics of the outputs computed during training. It is update(d) during |
90e4c0784d6e
Added draft of LinearRegression learner
bengioy@bengiomac.local
parents:
20
diff
changeset
|
80 training. |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
81 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
82 raise AbstractFunction() |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
83 |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
84 def __call__(self,input_dataset,output_fieldnames=None, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
85 test_stats_collector=None,copy_inputs=False, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
86 put_stats_in_output_dataset=True, |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
87 output_attributes=[]): |
10
80bf5492e571
Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents:
1
diff
changeset
|
88 """ |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
89 A trained or partially trained L{Model} can be used with |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
90 with one or more calls to it. The argument is an input L{DataSet} (possibly |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
91 containing a single example) and the result is an output L{DataSet} of the same length. |
128 | 92 If output_fieldnames is specified, it may be use to indicate which fields should |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
93 be constructed in the output L{DataSet} (for example ['output','classification_error']). |
193
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
94 Otherwise, some default output fields are produced (possibly depending on the input |
cb6b945acf5a
Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
180
diff
changeset
|
95 fields available in the input_dataset). |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
14
diff
changeset
|
96 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made |
132
f6505ec32dc3
Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents:
131
diff
changeset
|
97 visible in the output L{DataSet} returned by this method. |
128 | 98 Optionally, attributes of the learner can be copied in the output dataset, |
99 and statistics computed by the stats collector also put in the output dataset. | |
100 Note the distinction between fields (which are example-wise quantities, e.g. 'input') | |
101 and attributes (which are not, e.g. 'regularization_term'). | |
110
8fa1ef2411a0
Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents:
109
diff
changeset
|
102 """ |
8fa1ef2411a0
Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents:
109
diff
changeset
|
103 raise AbstractFunction() |