Mercurial > pylearn
view learner.py @ 220:1f527fe65e22
test on simple slicing works
author | Thierry Bertin-Mahieux <bertinmt@iro.umontreal.ca> |
---|---|
date | Fri, 23 May 2008 13:44:25 -0400 |
parents | bd728c83faff |
children | 14b9779622f9 |
line wrap: on
line source
from exceptions import * from dataset import AttributesHolder class LearningAlgorithm(object): """ Base class for learning algorithms, provides an interface that allows various algorithms to be applicable to generic learning algorithms. It is only given here to define the expected semantics. A L{Learner} can be seen as a learning algorithm, a function that when applied to training data returns a learned function (which is an object that can be applied to other data and return some output data). There are two main ways of using a learning algorithms, and some learning algorithms only support one of them. The first is the way of the standard machine learning framework, in which a learning algorithm is applied to a training dataset, model = learning_algorithm(training_set) resulting in a fully trained model that can be applied to another dataset: output_dataset = model(input_dataset) Note that the application of a dataset has no side-effect on the model. In that example, the training set may for example have 'input' and 'target' fields while the input dataset may have only 'input' (or both 'input' and 'target') and the output dataset would contain some default output fields defined by the learning algorithm (e.g. 'output' and 'error'). The second way of using a learning algorithm is in the online or adaptive framework, where the training data are only revealed in pieces (maybe one example or a batch of example at a time): model = learning_algorithm() results in a fresh model. The model can be adapted by presenting it with some training data, model.update(some_training_data) ... model.update(some_more_training_data) ... model.update(yet_more_training_data) and at any point one can use the model to perform some computation: output_dataset = model(input_dataset) """ def __init__(self): pass def __call__(self, training_dataset=None): """ Return a LearnerModel, either fresh (if training_dataset is None) or fully trained (otherwise). """ raise AbstractFunction() class LearnerModel(AttributesHolder): """ LearnerModel is a base class for models returned by instances of a LearningAlgorithm subclass. It is only given here to define the expected semantics. """ def __init__(self): pass def update(self,training_set,train_stats_collector=None): """ Continue training a learner, with the evidence provided by the given training set. Hence update can be called multiple times. This is the main method used for training in the on-line setting or the sequential (Bayesian or not) settings. This function has as side effect that self(data) will behave differently, according to the adaptation achieved by update(). The user may optionally provide a training L{StatsCollector} that is used to record some statistics of the outputs computed during training. It is update(d) during training. """ raise AbstractFunction() def __call__(self,input_dataset,output_fieldnames=None, test_stats_collector=None,copy_inputs=False, put_stats_in_output_dataset=True, output_attributes=[]): """ A trained or partially trained L{Model} can be used with with one or more calls to it. The argument is an input L{DataSet} (possibly containing a single example) and the result is an output L{DataSet} of the same length. If output_fieldnames is specified, it may be use to indicate which fields should be constructed in the output L{DataSet} (for example ['output','classification_error']). Otherwise, some default output fields are produced (possibly depending on the input fields available in the input_dataset). Optionally, if copy_inputs, the input fields (of the input_dataset) can be made visible in the output L{DataSet} returned by this method. Optionally, attributes of the learner can be copied in the output dataset, and statistics computed by the stats collector also put in the output dataset. Note the distinction between fields (which are example-wise quantities, e.g. 'input') and attributes (which are not, e.g. 'regularization_term'). """ raise AbstractFunction()