view doc/v2_planning/learner.txt @ 1023:fb6cae14fd07

dataset: Comment about viewing a dataset as a distribution
author Olivier Delalleau <delallea@iro>
date Fri, 03 Sep 2010 16:30:50 -0400
parents f82093bf4405
children 38f799f8b6cd
line wrap: on
line source


Discussion of Function Specification for Learner Types
======================================================

In its most abstract form, a learner is an object with the
following semantics:

* A learner has named hyper-parameters that control how it learns (these can be viewed
as options of the constructor, or might be set directly by a user)

* A learner also has an internal state that depends on what it has learned.

* A learner reads and produces data, so the definition of learner is
intimately linked to the definition of dataset (and task).

* A learner has one or more 'train' or 'adapt' functions by which
it is given a sample of data (typically either the whole training set, or
a mini-batch, which contains as a special case a single 'example'). Learners
interface with datasets in order to obtain data. These functions cause the
learner to change its internal state and take advantage to some extent
of the data provided. The 'train' function should take charge of
completely exploiting the dataset, as specified per the hyper-parameters,
so that it would typically be called only once. An 'adapt' function
is meant for learners that can operate in an 'online' setting where
data continually arrive and the control loop (when to stop) is to
be managed outside of it. For most intents and purposes, the
'train' function could also handle the 'online' case by providing
the controlled iterations over the dataset (which would then be
seen as a stream of examples).
    * learner.train(dataset)
    * learner.adapt(data)

* Different types of learners can then exploit their internal state
in order to perform various computations after training is completed,
or in the middle of training, e.g.,

   * y=learner.predict(x)
     for learners that see (x,y) pairs during training and predict y given x,
     or for learners that see only x's and learn a transformation of it (i.e. feature extraction).
     Here and below, x and y are tensor-like objects whose first index iterates
     over particular examples in a batch or minibatch of examples.

   * p=learner.probability(examples)
     p=learner.log_probability(examples)
     for learners that can estimate probability density or probability functions,
     note that example could be a pair (x,y) for learners that expect each example
     to represent such a pair. The second form is provided in case the example
     is high-dimensional and computations in the log-domain are numerically preferable.
     The first dimension of examples or of x and y is an index over a minibatch or a dataset.

   * p=learner.free_energy(x)
     for learners that can estimate a log unnormalized probability; the output has the same length as the input.

   * c=learner.costs(examples)
     returns a matrix of costs (one row per example, i.e., again the output has the same length
     as the input), the first column of which represents the cost whose expectation
     we wish to minimize over new samples from the unknown underlying data distribution.
     

Some learners may be able to handle x's and y's that contain missing values.

* For convenience, some of these operations could be bundled, e.g.

    * [prediction,costs] = learner.predict_and_adapt((x,y))

* Some learners could include in their internal state not only what they
have learned but some information about recently seen examples that conditions
the expected distribution of upcoming examples. In that case, they might
be used, e.g. in an online setting as follows:
     for (x,y) in data_stream:
        [prediction,costs]=learner.predict((x,y))
        accumulate_statistics(prediction,costs)

* In some cases, each example is itself a (possibly variable-size) sequence
or other variable-size object (e.g. an image, or a video)