annotate sparse_random_autoassociator/main.py @ 371:22463a194c90

Update doc
author Joseph Turian <turian@gmail.com>
date Mon, 07 Jul 2008 01:57:49 -0400
parents a1bbcde6b456
children 75bab24bb2d8
rev   line source
370
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
1 #!/usr/bin/python
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
2 """
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
3 An autoassociator for sparse inputs, using Ronan Collobert + Jason
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
4 Weston's sampling trick (2008).
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
5
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
6 The learned model is::
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
7 h = sigmoid(dot(x, w1) + b1)
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
8 y = sigmoid(dot(h, w2) + b2)
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
9
371
22463a194c90 Update doc
Joseph Turian <turian@gmail.com>
parents: 370
diff changeset
10 We assume that most of the inputs are zero, and hence that
22463a194c90 Update doc
Joseph Turian <turian@gmail.com>
parents: 370
diff changeset
11 we can separate x into xnonzero, x's nonzero components, and
22463a194c90 Update doc
Joseph Turian <turian@gmail.com>
parents: 370
diff changeset
12 xzero, a sample of the zeros. We sample---randomly without
22463a194c90 Update doc
Joseph Turian <turian@gmail.com>
parents: 370
diff changeset
13 replacement---ZERO_SAMPLE_SIZE zero columns from x.
370
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
14
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
15 The desideratum is that every nonzero entry is separated from every
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
16 zero entry by margin at least MARGIN.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
17 For each ynonzero, we want it to exceed max(yzero) by at least MARGIN.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
18 For each yzero, we want it to be exceed by min(ynonzero) by at least MARGIN.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
19 The loss is a hinge loss (linear). The loss is irrespective of the
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
20 xnonzero magnitude (this may be a limitation). Hence, all nonzeroes
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
21 are equally important to exceed the maximum yzero.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
22
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
23 LIMITATIONS:
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
24 - Only does pure stochastic gradient (batchsize = 1).
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
25 - Loss is irrespective of the xnonzero magnitude.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
26 - We will always use all nonzero entries, even if the training
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
27 instance is very non-sparse.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
28
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
29 @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
30 endless loop.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
31 """
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
32
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
33
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
34 import numpy, random
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
35 import globals
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
36 random.seed(globals.SEED)
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
37
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
38 nonzero_instances = []
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
39 nonzero_instances.append({1: 0.1, 5: 0.5, 9: 1})
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
40 nonzero_instances.append({2: 0.3, 5: 0.5, 8: 0.8})
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
41 nonzero_instances.append({1: 0.2, 2: 0.3, 5: 0.5})
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
42
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
43 import model
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
44 model = model.Model()
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
45
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
46 for i in xrange(100000):
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
47 # Select an instance
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
48 instance = nonzero_instances[i % len(nonzero_instances)]
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
49
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
50 # Get the nonzero indices
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
51 nonzero_indexes = instance.keys()
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
52 nonzero_indexes.sort()
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
53
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
54 # Get the zero indices
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
55 # @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an endless loop.
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
56 zero_indexes = []
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
57 while len(zero_indexes) < globals.ZERO_SAMPLE_SIZE:
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
58 idx = random.randint(0, globals.INPUT_DIMENSION - 1)
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
59 if idx in nonzero_indexes or idx in zero_indexes: continue
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
60 zero_indexes.append(idx)
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
61 zero_indexes.sort()
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
62
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
63 # SGD update over instance
a1bbcde6b456 Moved sparse_random_autoassociator from my repository
Joseph Turian <turian@gmail.com>
parents:
diff changeset
64 model.update(instance, nonzero_indexes, zero_indexes)