annotate learner.py @ 96:352910e0dbf5

added test and some restructuring for futur use
author Frederic Bastien <bastienf@iro.umontreal.ca>
date Tue, 06 May 2008 10:53:21 -0400
parents 3499918faa9d
children c4726e19b8ec
rev   line source
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
2 from dataset import *
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
3
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
4 class Learner(object):
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
5 """Base class for learning algorithms, provides an interface
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
6 that allows various algorithms to be applicable to generic learning
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
7 algorithms.
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
8
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
9 A Learner can be seen as a learning algorithm, a function that when
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
10 applied to training data returns a learned function, an object that
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
11 can be applied to other data and return some output data.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
12 """
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
13
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
14 def __init__(self):
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
15 pass
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
16
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
17 def forget(self):
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
18 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
19 Reset the state of the learner to a blank slate, before seeing
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
20 training data. The operation may be non-deterministic if the
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
21 learner has a random number generator that is set to use a
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
22 different seed each time it forget() is called.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
23 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
24 raise NotImplementedError
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
25
14
5ede27026e05 Working on gradient_based_learner
bengioy@bengiomac.local
parents: 13
diff changeset
26 def update(self,training_set,train_stats_collector=None):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
27 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
28 Continue training a learner, with the evidence provided by the given training set.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
29 Hence update can be called multiple times. This is particularly useful in the
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
30 on-line setting or the sequential (Bayesian or not) settings.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
31 The result is a function that can be applied on data, with the same
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
32 semantics of the Learner.use method.
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
33
14
5ede27026e05 Working on gradient_based_learner
bengioy@bengiomac.local
parents: 13
diff changeset
34 The user may optionally provide a training StatsCollector that is used to record
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
35 some statistics of the outputs computed during training. It is update(d) during
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
36 training.
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
37 """
13
633453635d51 Starting to work on gradient_based_learner.py
bengioy@bengiomac.local
parents: 10
diff changeset
38 return self.use # default behavior is 'non-adaptive', i.e. update does not do anything
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
39
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
40
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
41 def __call__(self,training_set,train_stats_collector=None):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
42 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
43 Train a learner from scratch using the provided training set,
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
44 and return the learned function.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
45 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
46 self.forget()
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
47 return self.update(learning_task,train_stats_collector)
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
48
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
49 def use(self,input_dataset,output_fields=None,copy_inputs=True):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
50 """Once a Learner has been trained by one or more call to 'update', it can
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
51 be used with one or more calls to 'use'. The argument is a DataSet (possibly
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
52 containing a single example) and the result is a DataSet of the same length.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
53 If output_fields is specified, it may be use to indicate which fields should
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
54 be constructed in the output DataSet (for example ['output','classification_error']).
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
55 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
56 visible in the output DataSet returned by this method.
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
57 """
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
58 raise NotImplementedError
77
1e2bb5bad636 toying with different ways to implement learners
bengioy@bengiomac.local
parents: 75
diff changeset
59
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
60 def attribute_names(self):
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
61 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
62 A Learner may have attributes that it wishes to export to other objects. To automate
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
63 such export, sub-classes should define here the names (list of strings) of these attributes.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
64 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
65 return []
77
1e2bb5bad636 toying with different ways to implement learners
bengioy@bengiomac.local
parents: 75
diff changeset
66
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
67 class TLearner(Learner):
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
68 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
69 TLearner is a virtual class of Learners that attempts to factor out of the definition
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
70 of a learner the steps that are common to many implementations of learning algorithms,
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
71 so as to leave only "the equations" to define in particular sub-classes, using Theano.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
72
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
73 In the default implementations of use and update, it is assumed that the 'use' and 'update' methods
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
74 visit examples in the input dataset sequentially. In the 'use' method only one pass through the dataset is done,
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
75 whereas the sub-learner may wish to iterate over the examples multiple times. Subclasses where this
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
76 basic model is not appropriate can simply redefine update or use.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
77
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
78 Sub-classes must provide the following functions and functionalities:
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
79 - attributeNames(): defines all the names of attributes which can be used as fields or
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
80 attributes in input/output datasets or in stats collectors.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
81 All these attributes are expected to be theano.Result objects
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
82 (with a .data property and recognized by theano.Function for compilation).
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
83 The sub-class constructor defines the relations between
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
84 the Theano variables that may be used by 'use' and 'update'
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
85 or by a stats collector.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
86 - defaultOutputFields(input_fields): return a list of default dataset output fields when
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
87 None are provided by the caller of use.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
88 -
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
89
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
90 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
91