Mercurial > pylearn
comparison sparse_random_autoassociator/main.py @ 370:a1bbcde6b456
Moved sparse_random_autoassociator from my repository
author | Joseph Turian <turian@gmail.com> |
---|---|
date | Mon, 07 Jul 2008 01:54:46 -0400 |
parents | |
children | 22463a194c90 |
comparison
equal
deleted
inserted
replaced
369:90a29489b5c8 | 370:a1bbcde6b456 |
---|---|
1 #!/usr/bin/python | |
2 """ | |
3 An autoassociator for sparse inputs, using Ronan Collobert + Jason | |
4 Weston's sampling trick (2008). | |
5 | |
6 The learned model is:: | |
7 h = sigmoid(dot(x, w1) + b1) | |
8 y = sigmoid(dot(h, w2) + b2) | |
9 | |
10 We assume that most of the inputs are zero, and hence that we can | |
11 separate x into xnonzero, x's nonzero components, and a xzero, | |
12 a sample of the zeros. (We randomly without replacement choose | |
13 ZERO_SAMPLE_SIZE zero columns.) | |
14 | |
15 The desideratum is that every nonzero entry is separated from every | |
16 zero entry by margin at least MARGIN. | |
17 For each ynonzero, we want it to exceed max(yzero) by at least MARGIN. | |
18 For each yzero, we want it to be exceed by min(ynonzero) by at least MARGIN. | |
19 The loss is a hinge loss (linear). The loss is irrespective of the | |
20 xnonzero magnitude (this may be a limitation). Hence, all nonzeroes | |
21 are equally important to exceed the maximum yzero. | |
22 | |
23 LIMITATIONS: | |
24 - Only does pure stochastic gradient (batchsize = 1). | |
25 - Loss is irrespective of the xnonzero magnitude. | |
26 - We will always use all nonzero entries, even if the training | |
27 instance is very non-sparse. | |
28 | |
29 @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an | |
30 endless loop. | |
31 """ | |
32 | |
33 | |
34 import numpy, random | |
35 import globals | |
36 random.seed(globals.SEED) | |
37 | |
38 nonzero_instances = [] | |
39 nonzero_instances.append({1: 0.1, 5: 0.5, 9: 1}) | |
40 nonzero_instances.append({2: 0.3, 5: 0.5, 8: 0.8}) | |
41 nonzero_instances.append({1: 0.2, 2: 0.3, 5: 0.5}) | |
42 | |
43 import model | |
44 model = model.Model() | |
45 | |
46 for i in xrange(100000): | |
47 # Select an instance | |
48 instance = nonzero_instances[i % len(nonzero_instances)] | |
49 | |
50 # Get the nonzero indices | |
51 nonzero_indexes = instance.keys() | |
52 nonzero_indexes.sort() | |
53 | |
54 # Get the zero indices | |
55 # @bug: If there are not ZERO_SAMPLE_SIZE zeroes, we will enter an endless loop. | |
56 zero_indexes = [] | |
57 while len(zero_indexes) < globals.ZERO_SAMPLE_SIZE: | |
58 idx = random.randint(0, globals.INPUT_DIMENSION - 1) | |
59 if idx in nonzero_indexes or idx in zero_indexes: continue | |
60 zero_indexes.append(idx) | |
61 zero_indexes.sort() | |
62 | |
63 # SGD update over instance | |
64 model.update(instance, nonzero_indexes, zero_indexes) |