pylearn: pylearn/datasets/test

annotate pylearn/datasets/test_modes.py @ 1391:124b939d997f

* removed temporary caltech_silhouette2 dataset * minor tweak to peaked_modes dataset (used for tempering stuff)

author	gdesjardins
date	Mon, 20 Dec 2010 18:08:48 -0500
parents	3efd0effb2a7
children

rev	line source
948 0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	1 from pylearn.datasets import Dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	2 import numpy
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	3
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	4 def neal94_AC(p=0.01, size=10000, seed=238904, w=[.25,.25,.25,.25]):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	5 """
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	6 Generates the dataset used in [Desjardins et al, AISTATS 2010]. The dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	7 is composed of 4x4 binary images with four basic modes: full black, full
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	8 white, and [black,white] and [white,black] images. Modes are created by
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	9 drawing each pixel from the 4 basic modes with a bit-flip probability p.
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	10
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	11 :param p: probability of flipping each pixel p: scalar, list (one per mode)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	12 :param size: total size of the dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	13 :param seed: seed used to draw random samples
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	14 :param w: weight of each mode within the dataset
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	15 """
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	16
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	17 # can modify the p-value separately for each mode
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	18 if not isinstance(p, (list,tuple)):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	19 p = [p for i in w]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	20
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	21 rng = numpy.random.RandomState(seed)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	22 data = numpy.zeros((size,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	23
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	24 # mode 1: black image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	25 B = numpy.zeros((1,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	26 # mode 2: white image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	27 W = numpy.ones((1,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	28 # mode 3: white image with black stripe in left-hand side of image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	29 BW = numpy.ones((4,4))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	30 BW[:, :2] = 0
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	31 BW = BW.reshape(1,16)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	32 # mode 4: white image with black stripe in right-hand side of image
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	33 WB = numpy.zeros((4,4))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	34 WB[:, :2] = 1
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	35 WB = WB.reshape(1,16)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	36
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	37 modes = [B,W,BW,WB]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	38 data = numpy.zeros((0,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	39
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	40 # create permutations of basic modes with bitflip prob p
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	41 for i, m in enumerate(modes):
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	42 n = size * w[i]
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	43 bitflip = rng.binomial(1,p[i],size=(n,16))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	44 d = numpy.abs(numpy.repeat(m, n, axis=0) - bitflip)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	45 data = numpy.vstack((data,d))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	46
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	47 y = numpy.zeros((size,1))
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	48
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	49 set = Dataset()
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	50 set.train = Dataset.Obj(x=data, y=y)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	51 set.test = None
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	52 set.img_shape = (4,4)
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	53
0b4c39c33eb9 Toy dataset used in Desjardins et al. (AISTATS 2010). gdesjardins parents: diff changeset	54 return set
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	55
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	56 def n_modes(n_modes=4, img_shape=(4,4), size=10000,
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	57 p=0.001, w=None, seed=238904):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	58 """
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	59 Generates the dataset used in [Desjardins et al, AISTATS 2010]. The dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	60 is composed of 4x4 binary images with four basic modes: full black, full
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	61 white, and [black,white] and [white,black] images. Modes are created by
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	62 drawing each pixel from the 4 basic modes with a bit-flip probability p.
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	63
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	64 :param p: probability of flipping each pixel p: scalar, list (one per mode)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	65 :param size: total size of the dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	66 :param seed: seed used to draw random samples
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	67 :param w: weight of each mode within the dataset
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	68 """
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	69 img_size = numpy.prod(img_shape)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	70
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	71 # can modify the p-value separately for each mode
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	72 if not isinstance(p, (list,tuple)):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	73 p = [p for i in xrange(n_modes)]
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	74
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	75 rng = numpy.random.RandomState(seed)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	76 data = numpy.zeros((0,img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	77
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	78 for i, m in enumerate(range(n_modes)):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	79 base = rng.randint(0,2,size=(1,img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	80
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	81 mode_size = w[i]*size if w is not None else size/numpy.float(n_modes)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	82 # create permutations of basic modes with bitflip prob p
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	83
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	84 bitflip = rng.binomial(1,p[i],size=(mode_size, img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	85 d = numpy.abs(numpy.repeat(base, mode_size, axis=0) - bitflip)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	86 data = numpy.vstack((data,d))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	87
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	88 y = numpy.zeros((size,1))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	89
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	90 set = Dataset()
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	91 set.train = Dataset.Obj(x=data, y=y)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	92 set.test = None
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	93 set.img_shape = (4,4)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	94
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	95 return set
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	96
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	97
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	98 class OnlineModes:
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	99
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	100 def __init__(self, n_modes, img_shape, seed=238904,
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	101 min_p=1e-4, max_p=1e-1,
1330 3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	102 min_w=0., max_w=1.,
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	103 w = None, p = None):
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	104
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	105 self.n_modes = n_modes
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	106 self.img_shape = img_shape
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	107 self.rng = numpy.random.RandomState(seed)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	108 self.img_size = numpy.prod(img_shape)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	109
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	110 # generate random p, w values
1330 3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	111 if p is None:
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	112 p = min_p + self.rng.rand(n_modes) * (max_p - min_p)
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	113 self.p = p
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	114
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	115 if w is None:
3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	116 w = min_w + self.rng.rand(n_modes) * (max_w - min_w)
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	117 self.w = w / numpy.sum(w)
1330 3efd0effb2a7 small changes to mode dataset (used for tempering work) gdesjardins parents: 1003 diff changeset	118
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	119 self.sort_w_idx = numpy.argsort(self.w)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	120
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	121 self.modes = self.rng.randint(0,2,size=(n_modes,self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	122
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	123 def __iter__(self): return self
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	124
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	125 def next(self, batch_size=1):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	126
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	127 modes = self.rng.multinomial(1, self.w, size=batch_size)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	128 data = numpy.zeros((batch_size, self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	129
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	130 modes_i = []
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	131
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	132 for bi, mode in enumerate(modes):
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	133 mi, = numpy.where(mode != 0)
1391 124b939d997f * removed temporary caltech_silhouette2 dataset gdesjardins parents: 1330 diff changeset	134 modes_i.append(mi)
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	135 bitflip = self.rng.binomial(1,self.p[mi], size=(1, self.img_size))
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	136 data[bi] = numpy.abs(self.modes[mi] - bitflip)
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	137
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	138 self.data = data
1391 124b939d997f * removed temporary caltech_silhouette2 dataset gdesjardins parents: 1330 diff changeset	139 self.data_modes = modes_i
1003 d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	140
d19e3cb809c1 Created online dataset, for testing PCD style learning algorithms. gdesjardins parents: 948 diff changeset	141 return data

Mercurial > pylearn

annotate pylearn/datasets/test_modes.py @ 1391:124b939d997f