Mercurial > pylearn
view doc/v2_planning/learner.txt @ 1023:fb6cae14fd07
dataset: Comment about viewing a dataset as a distribution
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 03 Sep 2010 16:30:50 -0400 |
parents | f82093bf4405 |
children | 38f799f8b6cd |
line wrap: on
line source
Discussion of Function Specification for Learner Types ====================================================== In its most abstract form, a learner is an object with the following semantics: * A learner has named hyper-parameters that control how it learns (these can be viewed as options of the constructor, or might be set directly by a user) * A learner also has an internal state that depends on what it has learned. * A learner reads and produces data, so the definition of learner is intimately linked to the definition of dataset (and task). * A learner has one or more 'train' or 'adapt' functions by which it is given a sample of data (typically either the whole training set, or a mini-batch, which contains as a special case a single 'example'). Learners interface with datasets in order to obtain data. These functions cause the learner to change its internal state and take advantage to some extent of the data provided. The 'train' function should take charge of completely exploiting the dataset, as specified per the hyper-parameters, so that it would typically be called only once. An 'adapt' function is meant for learners that can operate in an 'online' setting where data continually arrive and the control loop (when to stop) is to be managed outside of it. For most intents and purposes, the 'train' function could also handle the 'online' case by providing the controlled iterations over the dataset (which would then be seen as a stream of examples). * learner.train(dataset) * learner.adapt(data) * Different types of learners can then exploit their internal state in order to perform various computations after training is completed, or in the middle of training, e.g., * y=learner.predict(x) for learners that see (x,y) pairs during training and predict y given x, or for learners that see only x's and learn a transformation of it (i.e. feature extraction). Here and below, x and y are tensor-like objects whose first index iterates over particular examples in a batch or minibatch of examples. * p=learner.probability(examples) p=learner.log_probability(examples) for learners that can estimate probability density or probability functions, note that example could be a pair (x,y) for learners that expect each example to represent such a pair. The second form is provided in case the example is high-dimensional and computations in the log-domain are numerically preferable. The first dimension of examples or of x and y is an index over a minibatch or a dataset. * p=learner.free_energy(x) for learners that can estimate a log unnormalized probability; the output has the same length as the input. * c=learner.costs(examples) returns a matrix of costs (one row per example, i.e., again the output has the same length as the input), the first column of which represents the cost whose expectation we wish to minimize over new samples from the unknown underlying data distribution. Some learners may be able to handle x's and y's that contain missing values. * For convenience, some of these operations could be bundled, e.g. * [prediction,costs] = learner.predict_and_adapt((x,y)) * Some learners could include in their internal state not only what they have learned but some information about recently seen examples that conditions the expected distribution of upcoming examples. In that case, they might be used, e.g. in an online setting as follows: for (x,y) in data_stream: [prediction,costs]=learner.predict((x,y)) accumulate_statistics(prediction,costs) * In some cases, each example is itself a (possibly variable-size) sequence or other variable-size object (e.g. an image, or a video)