Mercurial > pylearn
comparison learner.py @ 262:14b9779622f9
Split LearningAlgorithm into OfflineLearningAlgorithm and OnlineLearningAlgorithm
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 03 Jun 2008 21:34:24 -0400 |
parents | bd728c83faff |
children | 78cc8fe3bbe9 |
comparison
equal
deleted
inserted
replaced
260:792f81d65f82 | 262:14b9779622f9 |
---|---|
1 | 1 |
2 | 2 |
3 from exceptions import * | 3 from exceptions import * |
4 from dataset import AttributesHolder | 4 from dataset import AttributesHolder |
5 | 5 |
6 class LearningAlgorithm(object): | 6 class OfflineLearningAlgorithm(object): |
7 """ | 7 """ |
8 Base class for learning algorithms, provides an interface | 8 Base class for offline learning algorithms, provides an interface |
9 that allows various algorithms to be applicable to generic learning | 9 that allows various algorithms to be applicable to generic learning |
10 algorithms. It is only given here to define the expected semantics. | 10 algorithms. It is only given here to define the expected semantics. |
11 | 11 |
12 A L{Learner} can be seen as a learning algorithm, a function that when | 12 An offline learning algorithm can be seen as a function that when |
13 applied to training data returns a learned function (which is an object that | 13 applied to training data returns a learned function (which is an object that |
14 can be applied to other data and return some output data). | 14 can be applied to other data and return some output data). |
15 | 15 |
16 There are two main ways of using a learning algorithms, and some learning | 16 The offline learning scenario is the standard and most common one |
17 algorithms only support one of them. The first is the way of the standard | 17 in machine learning: an offline learning algorithm is applied |
18 machine learning framework, in which a learning algorithm is applied | |
19 to a training dataset, | 18 to a training dataset, |
20 | 19 |
21 model = learning_algorithm(training_set) | 20 model = learning_algorithm(training_set) |
22 | 21 |
23 resulting in a fully trained model that can be applied to another dataset: | 22 resulting in a fully trained model that can be applied to another dataset |
23 in order to perform some desired computation: | |
24 | 24 |
25 output_dataset = model(input_dataset) | 25 output_dataset = model(input_dataset) |
26 | 26 |
27 Note that the application of a dataset has no side-effect on the model. | 27 Note that the application of a dataset has no side-effect on the model. |
28 In that example, the training set may for example have 'input' and 'target' | 28 In that example, the training set may for example have 'input' and 'target' |
29 fields while the input dataset may have only 'input' (or both 'input' and | 29 fields while the input dataset may have only 'input' (or both 'input' and |
30 'target') and the output dataset would contain some default output fields defined | 30 'target') and the output dataset would contain some default output fields defined |
31 by the learning algorithm (e.g. 'output' and 'error'). | 31 by the learning algorithm (e.g. 'output' and 'error'). The user may specifiy |
32 what the output dataset should contain either by setting options in the | |
33 model, by the presence of particular fields in the input dataset, or with | |
34 keyword options of the __call__ method of the model (see LearnedModel.__call__). | |
32 | 35 |
33 The second way of using a learning algorithm is in the online or | 36 """ |
34 adaptive framework, where the training data are only revealed in pieces | 37 |
38 def __init__(self): pass | |
39 | |
40 def __call__(self, training_dataset): | |
41 """ | |
42 Return a fully trained TrainedModel. | |
43 """ | |
44 raise AbstractFunction() | |
45 | |
46 class TrainedModel(AttributesHolder): | |
47 """ | |
48 TrainedModel is a base class for models returned by instances of an | |
49 OfflineLearningAlgorithm subclass. It is only given here to define the expected semantics. | |
50 """ | |
51 def __init__(self): | |
52 pass | |
53 | |
54 def __call__(self,input_dataset,output_fieldnames=None, | |
55 test_stats_collector=None,copy_inputs=False, | |
56 put_stats_in_output_dataset=True, | |
57 output_attributes=[]): | |
58 """ | |
59 A L{TrainedModel} can be used with | |
60 with one or more calls to it. The main argument is an input L{DataSet} (possibly | |
61 containing a single example) and the result is an output L{DataSet} of the same length. | |
62 If output_fieldnames is specified, it may be use to indicate which fields should | |
63 be constructed in the output L{DataSet} (for example ['output','classification_error']). | |
64 Otherwise, some default output fields are produced (possibly depending on the input | |
65 fields available in the input_dataset). | |
66 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made | |
67 visible in the output L{DataSet} returned by this method. | |
68 Optionally, attributes of the learner can be copied in the output dataset, | |
69 and statistics computed by the stats collector also put in the output dataset. | |
70 Note the distinction between fields (which are example-wise quantities, e.g. 'input') | |
71 and attributes (which are not, e.g. 'regularization_term'). | |
72 """ | |
73 raise AbstractFunction() | |
74 | |
75 | |
76 class OnlineLearningAlgorithm(object): | |
77 """ | |
78 Base class for online learning algorithms, provides an interface | |
79 that allows various algorithms to be applicable to generic online learning | |
80 algorithms. It is only given here to define the expected semantics. | |
81 | |
82 The basic setting is that the training data are only revealed in pieces | |
35 (maybe one example or a batch of example at a time): | 83 (maybe one example or a batch of example at a time): |
36 | 84 |
37 model = learning_algorithm() | 85 model = learning_algorithm() |
38 | 86 |
39 results in a fresh model. The model can be adapted by presenting | 87 results in a fresh model. The model can be adapted by presenting |
47 | 95 |
48 and at any point one can use the model to perform some computation: | 96 and at any point one can use the model to perform some computation: |
49 | 97 |
50 output_dataset = model(input_dataset) | 98 output_dataset = model(input_dataset) |
51 | 99 |
100 The model should be a LearnerModel subclass instance, and LearnerModel | |
101 is a subclass of LearnedModel. | |
102 | |
52 """ | 103 """ |
53 | 104 |
54 def __init__(self): pass | 105 def __init__(self): pass |
55 | 106 |
56 def __call__(self, training_dataset=None): | 107 def __call__(self, training_dataset=None): |
57 """ | 108 """ |
58 Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise). | 109 Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise). |
59 """ | 110 """ |
60 raise AbstractFunction() | 111 raise AbstractFunction() |
61 | 112 |
62 class LearnerModel(AttributesHolder): | 113 class LearnerModel(LearnedModel): |
63 """ | 114 """ |
64 LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass. | 115 LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass. |
65 It is only given here to define the expected semantics. | 116 It is only given here to define the expected semantics. |
66 """ | 117 """ |
67 def __init__(self): | 118 def __init__(self): |
68 pass | 119 pass |
69 | 120 |
70 def update(self,training_set,train_stats_collector=None): | 121 def update(self,training_set,train_stats_collector=None): |
71 """ | 122 """ |
72 Continue training a learner, with the evidence provided by the given training set. | 123 Continue training a learner model, with the evidence provided by the given training set. |
73 Hence update can be called multiple times. This is the main method used for training in the | 124 Hence update can be called multiple times. This is the main method used for training in the |
74 on-line setting or the sequential (Bayesian or not) settings. | 125 on-line setting or the sequential (Bayesian or not) settings. |
75 | 126 |
76 This function has as side effect that self(data) will behave differently, | 127 This function has as side effect that self(data) will behave differently, |
77 according to the adaptation achieved by update(). | 128 according to the adaptation achieved by update(). |
80 some statistics of the outputs computed during training. It is update(d) during | 131 some statistics of the outputs computed during training. It is update(d) during |
81 training. | 132 training. |
82 """ | 133 """ |
83 raise AbstractFunction() | 134 raise AbstractFunction() |
84 | 135 |
85 def __call__(self,input_dataset,output_fieldnames=None, | |
86 test_stats_collector=None,copy_inputs=False, | |
87 put_stats_in_output_dataset=True, | |
88 output_attributes=[]): | |
89 """ | |
90 A trained or partially trained L{Model} can be used with | |
91 with one or more calls to it. The argument is an input L{DataSet} (possibly | |
92 containing a single example) and the result is an output L{DataSet} of the same length. | |
93 If output_fieldnames is specified, it may be use to indicate which fields should | |
94 be constructed in the output L{DataSet} (for example ['output','classification_error']). | |
95 Otherwise, some default output fields are produced (possibly depending on the input | |
96 fields available in the input_dataset). | |
97 Optionally, if copy_inputs, the input fields (of the input_dataset) can be made | |
98 visible in the output L{DataSet} returned by this method. | |
99 Optionally, attributes of the learner can be copied in the output dataset, | |
100 and statistics computed by the stats collector also put in the output dataset. | |
101 Note the distinction between fields (which are example-wise quantities, e.g. 'input') | |
102 and attributes (which are not, e.g. 'regularization_term'). | |
103 """ | |
104 raise AbstractFunction() |