Mercurial > pylearn
view learner.py @ 96:352910e0dbf5
added test and some restructuring for futur use
author | Frederic Bastien <bastienf@iro.umontreal.ca> |
---|---|
date | Tue, 06 May 2008 10:53:21 -0400 |
parents | 3499918faa9d |
children | c4726e19b8ec |
line wrap: on
line source
from dataset import * class Learner(object): """Base class for learning algorithms, provides an interface that allows various algorithms to be applicable to generic learning algorithms. A Learner can be seen as a learning algorithm, a function that when applied to training data returns a learned function, an object that can be applied to other data and return some output data. """ def __init__(self): pass def forget(self): """ Reset the state of the learner to a blank slate, before seeing training data. The operation may be non-deterministic if the learner has a random number generator that is set to use a different seed each time it forget() is called. """ raise NotImplementedError def update(self,training_set,train_stats_collector=None): """ Continue training a learner, with the evidence provided by the given training set. Hence update can be called multiple times. This is particularly useful in the on-line setting or the sequential (Bayesian or not) settings. The result is a function that can be applied on data, with the same semantics of the Learner.use method. The user may optionally provide a training StatsCollector that is used to record some statistics of the outputs computed during training. It is update(d) during training. """ return self.use # default behavior is 'non-adaptive', i.e. update does not do anything def __call__(self,training_set,train_stats_collector=None): """ Train a learner from scratch using the provided training set, and return the learned function. """ self.forget() return self.update(learning_task,train_stats_collector) def use(self,input_dataset,output_fields=None,copy_inputs=True): """Once a Learner has been trained by one or more call to 'update', it can be used with one or more calls to 'use'. The argument is a DataSet (possibly containing a single example) and the result is a DataSet of the same length. If output_fields is specified, it may be use to indicate which fields should be constructed in the output DataSet (for example ['output','classification_error']). Optionally, if copy_inputs, the input fields (of the input_dataset) can be made visible in the output DataSet returned by this method. """ raise NotImplementedError def attribute_names(self): """ A Learner may have attributes that it wishes to export to other objects. To automate such export, sub-classes should define here the names (list of strings) of these attributes. """ return [] class TLearner(Learner): """ TLearner is a virtual class of Learners that attempts to factor out of the definition of a learner the steps that are common to many implementations of learning algorithms, so as to leave only "the equations" to define in particular sub-classes, using Theano. In the default implementations of use and update, it is assumed that the 'use' and 'update' methods visit examples in the input dataset sequentially. In the 'use' method only one pass through the dataset is done, whereas the sub-learner may wish to iterate over the examples multiple times. Subclasses where this basic model is not appropriate can simply redefine update or use. Sub-classes must provide the following functions and functionalities: - attributeNames(): defines all the names of attributes which can be used as fields or attributes in input/output datasets or in stats collectors. All these attributes are expected to be theano.Result objects (with a .data property and recognized by theano.Function for compilation). The sub-class constructor defines the relations between the Theano variables that may be used by 'use' and 'update' or by a stats collector. - defaultOutputFields(input_fields): return a list of default dataset output fields when None are provided by the caller of use. - """