annotate learner.py @ 223:517364d48ae0

should have solved the problem with minibatches not handling subsets of fieldnames, although maybe not super efficient
author Thierry Bertin-Mahieux <bertinmt@iro.umontreal.ca>
date Fri, 23 May 2008 16:01:01 -0400
parents bd728c83faff
children 14b9779622f9
rev   line source
211
bd728c83faff in __get__, problem if the i.stop was None, i being the slice, added one line replacing None by the len(self)
Thierry Bertin-Mahieux <bertinmt@iro.umontreal.ca>
parents: 209
diff changeset
1
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
2
172
fb4837eed1a6 fixed import of AbstractFunction
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 167
diff changeset
3 from exceptions import *
209
50a8302addaf template statscollector
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 193
diff changeset
4 from dataset import AttributesHolder
180
2698c0feeb54 mlp seems to work!
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 177
diff changeset
5
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
6 class LearningAlgorithm(object):
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
7 """
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
8 Base class for learning algorithms, provides an interface
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
9 that allows various algorithms to be applicable to generic learning
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
10 algorithms. It is only given here to define the expected semantics.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
11
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
12 A L{Learner} can be seen as a learning algorithm, a function that when
135
0d8e721cc63c Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 134
diff changeset
13 applied to training data returns a learned function (which is an object that
0d8e721cc63c Fixed bugs in dataset to make test_mlp.py work
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 134
diff changeset
14 can be applied to other data and return some output data).
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
15
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
16 There are two main ways of using a learning algorithms, and some learning
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
17 algorithms only support one of them. The first is the way of the standard
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
18 machine learning framework, in which a learning algorithm is applied
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
19 to a training dataset,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
20
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
21 model = learning_algorithm(training_set)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
22
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
23 resulting in a fully trained model that can be applied to another dataset:
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
24
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
25 output_dataset = model(input_dataset)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
26
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
27 Note that the application of a dataset has no side-effect on the model.
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
28 In that example, the training set may for example have 'input' and 'target'
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
29 fields while the input dataset may have only 'input' (or both 'input' and
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
30 'target') and the output dataset would contain some default output fields defined
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
31 by the learning algorithm (e.g. 'output' and 'error').
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
32
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
33 The second way of using a learning algorithm is in the online or
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
34 adaptive framework, where the training data are only revealed in pieces
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
35 (maybe one example or a batch of example at a time):
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
36
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
37 model = learning_algorithm()
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
38
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
39 results in a fresh model. The model can be adapted by presenting
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
40 it with some training data,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
41
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
42 model.update(some_training_data)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
43 ...
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
44 model.update(some_more_training_data)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
45 ...
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
46 model.update(yet_more_training_data)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
47
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
48 and at any point one can use the model to perform some computation:
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
49
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
50 output_dataset = model(input_dataset)
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
51
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
52 """
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
53
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
54 def __init__(self): pass
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
55
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
56 def __call__(self, training_dataset=None):
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
57 """
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
58 Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise).
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
59 """
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
60 raise AbstractFunction()
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
61
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
62 class LearnerModel(AttributesHolder):
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
63 """
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
64 LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass.
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
65 It is only given here to define the expected semantics.
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
66 """
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
67 def __init__(self):
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
68 pass
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
69
14
5ede27026e05 Working on gradient_based_learner
bengioy@bengiomac.local
parents: 13
diff changeset
70 def update(self,training_set,train_stats_collector=None):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
71 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
72 Continue training a learner, with the evidence provided by the given training set.
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
73 Hence update can be called multiple times. This is the main method used for training in the
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
74 on-line setting or the sequential (Bayesian or not) settings.
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
75
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
76 This function has as side effect that self(data) will behave differently,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
77 according to the adaptation achieved by update().
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
78
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
79 The user may optionally provide a training L{StatsCollector} that is used to record
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
80 some statistics of the outputs computed during training. It is update(d) during
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
81 training.
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
82 """
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
83 raise AbstractFunction()
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
84
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
85 def __call__(self,input_dataset,output_fieldnames=None,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
86 test_stats_collector=None,copy_inputs=False,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
87 put_stats_in_output_dataset=True,
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
88 output_attributes=[]):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
89 """
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
90 A trained or partially trained L{Model} can be used with
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
91 with one or more calls to it. The argument is an input L{DataSet} (possibly
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
92 containing a single example) and the result is an output L{DataSet} of the same length.
128
ee5507af2c60 minor edits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 126
diff changeset
93 If output_fieldnames is specified, it may be use to indicate which fields should
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
94 be constructed in the output L{DataSet} (for example ['output','classification_error']).
193
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
95 Otherwise, some default output fields are produced (possibly depending on the input
cb6b945acf5a Complete redesign of learner...
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 180
diff changeset
96 fields available in the input_dataset).
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
97 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made
132
f6505ec32dc3 Updated documentation slightly
Joseph Turian <turian@gmail.com>
parents: 131
diff changeset
98 visible in the output L{DataSet} returned by this method.
128
ee5507af2c60 minor edits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 126
diff changeset
99 Optionally, attributes of the learner can be copied in the output dataset,
ee5507af2c60 minor edits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 126
diff changeset
100 and statistics computed by the stats collector also put in the output dataset.
ee5507af2c60 minor edits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 126
diff changeset
101 Note the distinction between fields (which are example-wise quantities, e.g. 'input')
ee5507af2c60 minor edits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 126
diff changeset
102 and attributes (which are not, e.g. 'regularization_term').
110
8fa1ef2411a0 Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents: 109
diff changeset
103 """
8fa1ef2411a0 Worked on OneShotTLearner and implementation of LinearRegression
bengioy@bengiomac.local
parents: 109
diff changeset
104 raise AbstractFunction()