comparison sparse_random_autoassociator/main.py @ 370:a1bbcde6b456

Moved sparse_random_autoassociator from my repository
author Joseph Turian <turian@gmail.com>
date Mon, 07 Jul 2008 01:54:46 -0400
parents
children 22463a194c90
comparison
equal deleted inserted replaced
369:90a29489b5c8 370:a1bbcde6b456
1 #!/usr/bin/python
2 """
3 An autoassociator for sparse inputs, using Ronan Collobert + Jason
4 Weston's sampling trick (2008).
5
6 The learned model is::
7 h = sigmoid(dot(x, w1) + b1)
8 y = sigmoid(dot(h, w2) + b2)
9
10 We assume that most of the inputs are zero, and hence that we can
11 separate x into xnonzero, x's nonzero components, and a xzero,
12 a sample of the zeros. (We randomly without replacement choose
13 ZERO_SAMPLE_SIZE zero columns.)
14
15 The desideratum is that every nonzero entry is separated from every
16 zero entry by margin at least MARGIN.
17 For each ynonzero, we want it to exceed max(yzero) by at least MARGIN.
18 For each yzero, we want it to be exceed by min(ynonzero) by at least MARGIN.
19 The loss is a hinge loss (linear). The loss is irrespective of the
20 xnonzero magnitude (this may be a limitation). Hence, all nonzeroes
21 are equally important to exceed the maximum yzero.
22
23 LIMITATIONS:
24 - Only does pure stochastic gradient (batchsize = 1).
25 - Loss is irrespective of the xnonzero magnitude.
26 - We will always use all nonzero entries, even if the training
27 instance is very non-sparse.
28
29 @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an
30 endless loop.
31 """
32
33
34 import numpy, random
35 import globals
36 random.seed(globals.SEED)
37
38 nonzero_instances = []
39 nonzero_instances.append({1: 0.1, 5: 0.5, 9: 1})
40 nonzero_instances.append({2: 0.3, 5: 0.5, 8: 0.8})
41 nonzero_instances.append({1: 0.2, 2: 0.3, 5: 0.5})
42
43 import model
44 model = model.Model()
45
46 for i in xrange(100000):
47 # Select an instance
48 instance = nonzero_instances[i % len(nonzero_instances)]
49
50 # Get the nonzero indices
51 nonzero_indexes = instance.keys()
52 nonzero_indexes.sort()
53
54 # Get the zero indices
55 # @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an endless loop.
56 zero_indexes = []
57 while len(zero_indexes) < globals.ZERO_SAMPLE_SIZE:
58 idx = random.randint(0, globals.INPUT_DIMENSION - 1)
59 if idx in nonzero_indexes or idx in zero_indexes: continue
60 zero_indexes.append(idx)
61 zero_indexes.sort()
62
63 # SGD update over instance
64 model.update(instance, nonzero_indexes, zero_indexes)