annotate learner.py @ 109:d97f6fe6bdf9

Merged with this morning unsaved edits
author bengioy@bengiomac.local
date Tue, 06 May 2008 20:01:34 -0400
parents c4916445e025
children 8fa1ef2411a0
rev   line source
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
2 from dataset import *
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
3
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
4 class Learner(object):
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
5 """Base class for learning algorithms, provides an interface
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
6 that allows various algorithms to be applicable to generic learning
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
7 algorithms.
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
8
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
9 A Learner can be seen as a learning algorithm, a function that when
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
10 applied to training data returns a learned function, an object that
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
11 can be applied to other data and return some output data.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
12 """
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
13
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
14 def __init__(self):
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
15 pass
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
16
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
17 def forget(self):
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
18 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
19 Reset the state of the learner to a blank slate, before seeing
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
20 training data. The operation may be non-deterministic if the
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
21 learner has a random number generator that is set to use a
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
22 different seed each time it forget() is called.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
23 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
24 raise NotImplementedError
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
25
14
5ede27026e05 Working on gradient_based_learner
bengioy@bengiomac.local
parents: 13
diff changeset
26 def update(self,training_set,train_stats_collector=None):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
27 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
28 Continue training a learner, with the evidence provided by the given training set.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
29 Hence update can be called multiple times. This is particularly useful in the
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
30 on-line setting or the sequential (Bayesian or not) settings.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
31 The result is a function that can be applied on data, with the same
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
32 semantics of the Learner.use method.
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
33
14
5ede27026e05 Working on gradient_based_learner
bengioy@bengiomac.local
parents: 13
diff changeset
34 The user may optionally provide a training StatsCollector that is used to record
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
35 some statistics of the outputs computed during training. It is update(d) during
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
36 training.
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
37 """
13
633453635d51 Starting to work on gradient_based_learner.py
bengioy@bengiomac.local
parents: 10
diff changeset
38 return self.use # default behavior is 'non-adaptive', i.e. update does not do anything
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
39
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
40
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
41 def __call__(self,training_set,train_stats_collector=None):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
42 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
43 Train a learner from scratch using the provided training set,
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
44 and return the learned function.
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
45 """
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
46 self.forget()
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
47 return self.update(learning_task,train_stats_collector)
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
48
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
49 def use(self,input_dataset,output_fields=None,copy_inputs=True):
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
50 """Once a Learner has been trained by one or more call to 'update', it can
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
51 be used with one or more calls to 'use'. The argument is a DataSet (possibly
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
52 containing a single example) and the result is a DataSet of the same length.
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
53 If output_fields is specified, it may be use to indicate which fields should
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
54 be constructed in the output DataSet (for example ['output','classification_error']).
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 14
diff changeset
55 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made
75
90e4c0784d6e Added draft of LinearRegression learner
bengioy@bengiomac.local
parents: 20
diff changeset
56 visible in the output DataSet returned by this method.
10
80bf5492e571 Rewrote learner.py according to the specs in the wiki for learners.
bengioy@esprit.iro.umontreal.ca
parents: 1
diff changeset
57 """
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
58 raise NotImplementedError
77
1e2bb5bad636 toying with different ways to implement learners
bengioy@bengiomac.local
parents: 75
diff changeset
59
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
60 def attributeNames(self):
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
61 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
62 A Learner may have attributes that it wishes to export to other objects. To automate
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
63 such export, sub-classes should define here the names (list of strings) of these attributes.
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
64
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
65 @todo By default, attributeNames looks for all dictionary entries whose name does not start with _.
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
66 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
67 return []
77
1e2bb5bad636 toying with different ways to implement learners
bengioy@bengiomac.local
parents: 75
diff changeset
68
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
69 class TLearner(Learner):
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
70 """
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
71 TLearner is a virtual class of Learners that attempts to factor out of the definition
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
72 of a learner the steps that are common to many implementations of learning algorithms,
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
73 so as to leave only 'the equations' to define in particular sub-classes, using Theano.
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
74
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
75 In the default implementations of use and update, it is assumed that the 'use' and 'update' methods
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
76 visit examples in the input dataset sequentially. In the 'use' method only one pass through the dataset is done,
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
77 whereas the sub-learner may wish to iterate over the examples multiple times. Subclasses where this
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
78 basic model is not appropriate can simply redefine update or use.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
79
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
80 Sub-classes must provide the following functions and functionalities:
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
81 - attributeNames(): defines all the names of attributes which can be used as fields or
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
82 attributes in input/output datasets or in stats collectors.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
83 All these attributes are expected to be theano.Result objects
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
84 (with a .data property and recognized by theano.Function for compilation).
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
85 The sub-class constructor defines the relations between
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
86 the Theano variables that may be used by 'use' and 'update'
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
87 or by a stats collector.
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
88 - defaultOutputFields(input_fields): return a list of default dataset output fields when
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
89 None are provided by the caller of use.
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
90 The following naming convention is assumed and important.
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
91 Attributes whose names are listed in attributeNames() can be of any type,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
92 but those that can be referenced as input/output dataset fields or as
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
93 output attributes in 'use' or as input attributes in the stats collector
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
94 should be associated with a Theano Result variable. If the exported attribute
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
95 name is <name>, the corresponding Result name (an internal attribute of
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
96 the TLearner, created in the sub-class constructor) should be _<name>.
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
97 Typically <name> will be numpy ndarray and _<name> will be the corresponding
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
98 Theano Tensor (for symbolic manipulation).
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
99
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
100 @todo pousser dans Learner toute la poutine qui peut l'etre sans etre
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
101 dependant de Theano
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
102 """
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
103
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
104 def __init__(self):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
105 Learner.__init__(self)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
106
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
107 def _minibatchwise_use_functions(self, input_fields, output_fields, stats_collector):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
108 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
109 Private helper function called by the generic TLearner.use. It returns a function
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
110 that can map the given input fields to the given output fields (along with the
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
111 attributes that the stats collector needs for its computation.
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
112 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
113 if not output_fields:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
114 output_fields = self.defaultOutputFields(input_fields)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
115 if stats_collector:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
116 stats_collector_inputs = stats_collector.inputUpdateAttributes()
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
117 for attribute in stats_collector_inputs:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
118 if attribute not in input_fields:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
119 output_fields.append(attribute)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
120 key = (input_fields,output_fields)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
121 if key not in self.use_functions_dictionary:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
122 self.use_functions_dictionary[key]=Function(self._names2attributes(input_fields),
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
123 self._names2attributes(output_fields))
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
124 return self.use_functions_dictionary[key]
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
125
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
126 def attributes(self,return_copy=False):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
127 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
128 Return a list with the values of the learner's attributes (or optionally, a deep copy).
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
129 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
130 return self.names2attributes(self.attributeNames())
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
131
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
132 def _names2attributes(self,names,return_Result=False, return_copy=False):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
133 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
134 Private helper function that maps a list of attribute names to a list
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
135 of (optionally copies) values or of the Result objects that own these values.
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
136 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
137 if return_Result:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
138 if return_copy:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
139 return [copy.deepcopy(self.__getattr__(name)) for name in names]
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
140 else:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
141 return [self.__getattr__(name) for name in names]
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
142 else:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
143 if return_copy:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
144 return [copy.deepcopy(self.__getattr__(name).data) for name in names]
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
145 else:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
146 return [self.__getattr__(name).data for name in names]
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
147
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
148 def use(self,input_dataset,output_fieldnames=None,output_attributes=[],
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
149 test_stats_collector=None,copy_inputs=True):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
150 """
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
151 The learner tries to compute in the output dataset the output fields specified
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
152
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
153 @todo check if some of the learner attributes are actually SPECIFIED
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
154 as attributes of the input_dataset, and if so use their values instead
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
155 of the ones in the learner.
109
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
156
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
157 The learner tries to compute in the output dataset the output fields specified.
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
158 If None is specified then self.defaultOutputFields(input_dataset.fieldNames())
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
159 is called to determine the output fields.
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
160
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
161 Attributes of the learner can also optionally be copied into the output dataset.
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
162 If output_attributes is None then all of the attributes in self.AttributeNames()
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
163 are copied in the output dataset, but if it is [] (the default), then none are copied.
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
164 If a test_stats_collector is provided, then its attributes (test_stats_collector.AttributeNames())
d97f6fe6bdf9 Merged with this morning unsaved edits
bengioy@bengiomac.local
parents: 107
diff changeset
165 are also copied into the output dataset attributes.
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
166 """
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
167 minibatchwise_use_function = _minibatchwise_use_functions(input_dataset.fieldNames(),
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
168 output_fieldnames,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
169 test_stats_collector)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
170 virtual_output_dataset = ApplyFunctionDataSet(input_dataset,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
171 minibatchwise_use_function,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
172 True,DataSet.numpy_vstack,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
173 DataSet.numpy_hstack)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
174 # actually force the computation
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
175 output_dataset = CachedDataSet(virtual_output_dataset,True)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
176 if copy_inputs:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
177 output_dataset = input_dataset | output_dataset
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
178 # copy the wanted attributes in the dataset
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
179 if output_attributes is None:
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
180 output_attributes = self.attributeNames()
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
181 if output_attributes:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
182 assert set(output_attributes) <= set(self.attributeNames())
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
183 output_dataset.setAttributes(output_attributes,
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
184 self._names2attributes(output_attributes,return_copy=True))
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
185 if test_stats_collector:
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
186 test_stats_collector.update(output_dataset)
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
187 output_dataset.setAttributes(test_stats_collector.attributeNames(),
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
188 test_stats_collector.attributes())
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
189 return output_dataset
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
190
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
191
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
192 class OneShotTLearner(TLearner):
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
193 """
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
194 This adds to TLearner a
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
195 - update_start(), update_end(), update_minibatch(minibatch), end_epoch():
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
196 functions executed at the beginning, the end, in the middle
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
197 (for each minibatch) of the update method, and at the end
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
198 of each epoch. This model only
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
199 works for 'online' or one-shot learning that requires
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
200 going only once through the training data. For more complicated
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
201 models, more specialized subclasses of TLearner should be used
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
202 or a learning-algorithm specific update method should be defined.
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
203 """
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
204
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
205 def __init__(self):
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
206 TLearner.__init__(self)
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
207
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
208 def update_start(self): pass
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
209 def update_end(self): pass
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
210 def update_minibatch(self,minibatch):
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
211 raise AbstractFunction()
78
3499918faa9d In the middle of designing TLearner
bengioy@bengiomac.local
parents: 77
diff changeset
212
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
213 def update(self,training_set,train_stats_collector=None):
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
214 """
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
215 @todo check if some of the learner attributes are actually SPECIFIED
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
216 in as attributes of the training_set.
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
217 """
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
218 self.update_start()
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
219 stop=False
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
220 while not stop:
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
221 if train_stats_collector:
107
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
222 train_stats_collector.forget() # restart stats collectin at the beginning of each epoch
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
223 for minibatch in training_set.minibatches(self.training_set_input_fields,
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
224 minibatch_size=self.minibatch_size):
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
225 self.update_minibatch(minibatch)
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
226 if train_stats_collector:
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
227 minibatch_set = minibatch.examples()
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
228 minibatch_set.setAttributes(self.attributeNames(),self.attributes())
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
229 train_stats_collector.update(minibatch_set)
c4916445e025 Comments from Pascal V.
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 92
diff changeset
230 stop = self.end_epoch()
92
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
231 self.update_end()
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
232 return self.use
c4726e19b8ec Finished first draft of TLearner
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 78
diff changeset
233