view sandbox/sparse_random_autoassociator/main.py @ 448:0961d4b56ec5

Added some documentation
author Joseph Turian <turian@gmail.com>
date Wed, 03 Sep 2008 17:08:54 -0400
parents 36baeb7125a4
children
line wrap: on
line source

#!/usr/bin/python
"""
    An autoassociator for sparse inputs, using Ronan Collobert + Jason
    Weston's sampling trick (2008).

    The learned model is::
       h   = sigmoid(dot(x, w1) + b1)
       y   = sigmoid(dot(h, w2) + b2)

    We assume that most of the inputs are zero, and hence that
    we can separate x into xnonzero, x's nonzero components, and
    xzero, a sample of the zeros. We sample---randomly without
    replacement---ZERO_SAMPLE_SIZE zero columns from x.

    The desideratum is that every nonzero entry is separated from every
    zero entry by margin at least MARGIN.
    For each ynonzero, we want it to exceed max(yzero) by at least MARGIN.
    For each yzero, we want it to be exceed by min(ynonzero) by at least MARGIN.
    The loss is a hinge loss (linear). The loss is irrespective of the
    xnonzero magnitude (this may be a limitation). Hence, all nonzeroes
    are equally important to exceed the maximum yzero.

    (Alternately, there is a commented out binary xent loss.)

    LIMITATIONS:
       - Only does pure stochastic gradient (batchsize = 1).
       - Loss is irrespective of the xnonzero magnitude.
       - We will always use all nonzero entries, even if the training
       instance is very non-sparse.
"""


import numpy

nonzero_instances = []
nonzero_instances.append({1: 0.1, 5: 0.5, 9: 1})
nonzero_instances.append({2: 0.3, 5: 0.5, 8: 0.8})
nonzero_instances.append({1: 0.2, 2: 0.3, 5: 0.5})

import model
model = model.Model()

for i in xrange(100000):
    # Select an instance
    instance = nonzero_instances[i % len(nonzero_instances)]

    # SGD update over instance
    model.update(instance)