annotate pylearn/datasets/nade.py @ 1482:be4a49a65333

modified Nade dataset to use new config.get_filepath_in_roots mechanism
author gdesjardins
date Tue, 05 Jul 2011 10:56:40 -0400
parents b24ed2aa077e
children f7b348e6a98e
rev   line source
1465
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
1 import os
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
2 import numpy
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
3
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
4 from pylearn.io.pmat import PMat
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
5 from pylearn.datasets.config import data_root # config
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
6 from pylearn.datasets.dataset import Dataset
1482
be4a49a65333 modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents: 1467
diff changeset
7 import config
1465
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
8
1467
b24ed2aa077e name parameter for dataset needs to be a keyword arg. for compatibility with
gdesjardins
parents: 1465
diff changeset
9 def load_dataset(name=None):
1465
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
10 """
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
11 Various datasets which were used in the following paper.
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
12 The Neural Autoregressive Distribution Estimator
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
13 Hugo Larochelle and Iain Murray, AISTATS 2011
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
14
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
15 :param name: string specifying which dataset to load
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
16 :return: Dataset object
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
17 dataset.train.x: matrix of training data of shape (num_examples, ndim)
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
18 dataset.train.y: vector of training labels of length num_examples. Labels are
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
19 integer valued and represent the class it belongs too.
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
20 dataset.valid.x: idem for validation data
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
21 dataset.valid.y: idem for validation data
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
22 dataset.test.x: idem for test data
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
23 dataset.test.y: idem for test data
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
24
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
25 WARNING: class labels are integer-valued instead of 1-of-n encoding !
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
26 """
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
27 assert name in ['adult','binarized_mnist', 'mnist', 'connect4','dna',
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
28 'mushrooms','nips','ocr_letters','rcv1','web']
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
29 rval = Dataset()
1482
be4a49a65333 modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents: 1467
diff changeset
30
be4a49a65333 modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents: 1467
diff changeset
31 # dataset lookup through $PYLEARN_DATA_ROOT
be4a49a65333 modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents: 1467
diff changeset
32 _path = os.path.join('larocheh', name)
be4a49a65333 modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents: 1467
diff changeset
33 path = config.get_filepath_in_roots(_path)
1465
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
34
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
35 # load training set
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
36 x=numpy.load(os.path.join(path,'train_data.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
37 y_fname = os.path.join(path, 'train_labels.npy')
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
38 if os.path.exists(y_fname):
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
39 y = numpy.load(os.path.join(path,'train_labels.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
40 else:
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
41 y = None
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
42 rval.train = Dataset.Obj(x=x, y=y)
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
43
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
44 # load validation set
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
45 x=numpy.load(os.path.join(path,'valid_data.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
46 y_fname = os.path.join(path, 'valid_labels.npy')
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
47 if os.path.exists(y_fname):
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
48 y = numpy.load(os.path.join(path,'valid_labels.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
49 else:
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
50 y = None
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
51 rval.valid = Dataset.Obj(x=x, y=y)
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
52
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
53 # load training set
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
54 x=numpy.load(os.path.join(path,'test_data.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
55 y_fname = os.path.join(path, 'test_labels.npy')
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
56 if os.path.exists(y_fname):
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
57 y = numpy.load(os.path.join(path,'test_labels.npy'))
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
58 else:
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
59 y = None
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
60 rval.test = Dataset.Obj(x=x, y=y)
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
61
490616262500 Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff changeset
62 return rval