annotate pylearn/datasets/test_modes.py @ 1391:124b939d997f

* removed temporary caltech_silhouette2 dataset * minor tweak to peaked_modes dataset (used for tempering stuff)
author gdesjardins
date Mon, 20 Dec 2010 18:08:48 -0500
parents 3efd0effb2a7
children
rev   line source
948
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
1 from pylearn.datasets import Dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
2 import numpy
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
3
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
4 def neal94_AC(p=0.01, size=10000, seed=238904, w=[.25,.25,.25,.25]):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
5 """
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
6 Generates the dataset used in [Desjardins et al, AISTATS 2010]. The dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
7 is composed of 4x4 binary images with four basic modes: full black, full
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
8 white, and [black,white] and [white,black] images. Modes are created by
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
9 drawing each pixel from the 4 basic modes with a bit-flip probability p.
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
10
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
11 :param p: probability of flipping each pixel p: scalar, list (one per mode)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
12 :param size: total size of the dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
13 :param seed: seed used to draw random samples
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
14 :param w: weight of each mode within the dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
15 """
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
16
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
17 # can modify the p-value separately for each mode
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
18 if not isinstance(p, (list,tuple)):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
19 p = [p for i in w]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
20
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
21 rng = numpy.random.RandomState(seed)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
22 data = numpy.zeros((size,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
23
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
24 # mode 1: black image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
25 B = numpy.zeros((1,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
26 # mode 2: white image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
27 W = numpy.ones((1,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
28 # mode 3: white image with black stripe in left-hand side of image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
29 BW = numpy.ones((4,4))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
30 BW[:, :2] = 0
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
31 BW = BW.reshape(1,16)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
32 # mode 4: white image with black stripe in right-hand side of image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
33 WB = numpy.zeros((4,4))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
34 WB[:, :2] = 1
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
35 WB = WB.reshape(1,16)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
36
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
37 modes = [B,W,BW,WB]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
38 data = numpy.zeros((0,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
39
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
40 # create permutations of basic modes with bitflip prob p
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
41 for i, m in enumerate(modes):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
42 n = size * w[i]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
43 bitflip = rng.binomial(1,p[i],size=(n,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
44 d = numpy.abs(numpy.repeat(m, n, axis=0) - bitflip)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
45 data = numpy.vstack((data,d))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
46
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
47 y = numpy.zeros((size,1))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
48
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
49 set = Dataset()
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
50 set.train = Dataset.Obj(x=data, y=y)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
51 set.test = None
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
52 set.img_shape = (4,4)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
53
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010).
gdesjardins
parents:
diff changeset
54 return set
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
55
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
56 def n_modes(n_modes=4, img_shape=(4,4), size=10000,
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
57 p=0.001, w=None, seed=238904):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
58 """
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
59 Generates the dataset used in [Desjardins et al, AISTATS 2010]. The dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
60 is composed of 4x4 binary images with four basic modes: full black, full
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
61 white, and [black,white] and [white,black] images. Modes are created by
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
62 drawing each pixel from the 4 basic modes with a bit-flip probability p.
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
63
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
64 :param p: probability of flipping each pixel p: scalar, list (one per mode)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
65 :param size: total size of the dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
66 :param seed: seed used to draw random samples
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
67 :param w: weight of each mode within the dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
68 """
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
69 img_size = numpy.prod(img_shape)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
70
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
71 # can modify the p-value separately for each mode
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
72 if not isinstance(p, (list,tuple)):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
73 p = [p for i in xrange(n_modes)]
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
74
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
75 rng = numpy.random.RandomState(seed)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
76 data = numpy.zeros((0,img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
77
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
78 for i, m in enumerate(range(n_modes)):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
79 base = rng.randint(0,2,size=(1,img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
80
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
81 mode_size = w[i]*size if w is not None else size/numpy.float(n_modes)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
82 # create permutations of basic modes with bitflip prob p
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
83
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
84 bitflip = rng.binomial(1,p[i],size=(mode_size, img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
85 d = numpy.abs(numpy.repeat(base, mode_size, axis=0) - bitflip)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
86 data = numpy.vstack((data,d))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
87
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
88 y = numpy.zeros((size,1))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
89
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
90 set = Dataset()
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
91 set.train = Dataset.Obj(x=data, y=y)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
92 set.test = None
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
93 set.img_shape = (4,4)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
94
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
95 return set
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
96
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
97
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
98 class OnlineModes:
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
99
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
100 def __init__(self, n_modes, img_shape, seed=238904,
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
101 min_p=1e-4, max_p=1e-1,
1330
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
102 min_w=0., max_w=1.,
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
103 w = None, p = None):
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
104
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
105 self.n_modes = n_modes
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
106 self.img_shape = img_shape
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
107 self.rng = numpy.random.RandomState(seed)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
108 self.img_size = numpy.prod(img_shape)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
109
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
110 # generate random p, w values
1330
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
111 if p is None:
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
112 p = min_p + self.rng.rand(n_modes) * (max_p - min_p)
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
113 self.p = p
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
114
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
115 if w is None:
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
116 w = min_w + self.rng.rand(n_modes) * (max_w - min_w)
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
117 self.w = w / numpy.sum(w)
1330
3efd0effb2a7 small changes to mode dataset (used for tempering work)
gdesjardins
parents: 1003
diff changeset
118
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
119 self.sort_w_idx = numpy.argsort(self.w)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
120
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
121 self.modes = self.rng.randint(0,2,size=(n_modes,self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
122
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
123 def __iter__(self): return self
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
124
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
125 def next(self, batch_size=1):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
126
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
127 modes = self.rng.multinomial(1, self.w, size=batch_size)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
128 data = numpy.zeros((batch_size, self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
129
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
130 modes_i = []
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
131
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
132 for bi, mode in enumerate(modes):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
133 mi, = numpy.where(mode != 0)
1391
124b939d997f * removed temporary caltech_silhouette2 dataset
gdesjardins
parents: 1330
diff changeset
134 modes_i.append(mi)
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
135 bitflip = self.rng.binomial(1,self.p[mi], size=(1, self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
136 data[bi] = numpy.abs(self.modes[mi] - bitflip)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
137
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
138 self.data = data
1391
124b939d997f * removed temporary caltech_silhouette2 dataset
gdesjardins
parents: 1330
diff changeset
139 self.data_modes = modes_i
1003
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
140
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms.
gdesjardins
parents: 948
diff changeset
141 return data