Mercurial > pylearn
annotate pylearn/datasets/nade.py @ 1482:be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
author | gdesjardins |
---|---|
date | Tue, 05 Jul 2011 10:56:40 -0400 |
parents | b24ed2aa077e |
children | f7b348e6a98e |
rev | line source |
---|---|
1465
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
1 import os |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
2 import numpy |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
3 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
4 from pylearn.io.pmat import PMat |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
5 from pylearn.datasets.config import data_root # config |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
6 from pylearn.datasets.dataset import Dataset |
1482
be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents:
1467
diff
changeset
|
7 import config |
1465
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
8 |
1467
b24ed2aa077e
name parameter for dataset needs to be a keyword arg. for compatibility with
gdesjardins
parents:
1465
diff
changeset
|
9 def load_dataset(name=None): |
1465
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
10 """ |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
11 Various datasets which were used in the following paper. |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
12 The Neural Autoregressive Distribution Estimator |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
13 Hugo Larochelle and Iain Murray, AISTATS 2011 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
14 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
15 :param name: string specifying which dataset to load |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
16 :return: Dataset object |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
17 dataset.train.x: matrix of training data of shape (num_examples, ndim) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
18 dataset.train.y: vector of training labels of length num_examples. Labels are |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
19 integer valued and represent the class it belongs too. |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
20 dataset.valid.x: idem for validation data |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
21 dataset.valid.y: idem for validation data |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
22 dataset.test.x: idem for test data |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
23 dataset.test.y: idem for test data |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
24 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
25 WARNING: class labels are integer-valued instead of 1-of-n encoding ! |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
26 """ |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
27 assert name in ['adult','binarized_mnist', 'mnist', 'connect4','dna', |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
28 'mushrooms','nips','ocr_letters','rcv1','web'] |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
29 rval = Dataset() |
1482
be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents:
1467
diff
changeset
|
30 |
be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents:
1467
diff
changeset
|
31 # dataset lookup through $PYLEARN_DATA_ROOT |
be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents:
1467
diff
changeset
|
32 _path = os.path.join('larocheh', name) |
be4a49a65333
modified Nade dataset to use new config.get_filepath_in_roots mechanism
gdesjardins
parents:
1467
diff
changeset
|
33 path = config.get_filepath_in_roots(_path) |
1465
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
34 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
35 # load training set |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
36 x=numpy.load(os.path.join(path,'train_data.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
37 y_fname = os.path.join(path, 'train_labels.npy') |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
38 if os.path.exists(y_fname): |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
39 y = numpy.load(os.path.join(path,'train_labels.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
40 else: |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
41 y = None |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
42 rval.train = Dataset.Obj(x=x, y=y) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
43 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
44 # load validation set |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
45 x=numpy.load(os.path.join(path,'valid_data.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
46 y_fname = os.path.join(path, 'valid_labels.npy') |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
47 if os.path.exists(y_fname): |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
48 y = numpy.load(os.path.join(path,'valid_labels.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
49 else: |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
50 y = None |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
51 rval.valid = Dataset.Obj(x=x, y=y) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
52 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
53 # load training set |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
54 x=numpy.load(os.path.join(path,'test_data.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
55 y_fname = os.path.join(path, 'test_labels.npy') |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
56 if os.path.exists(y_fname): |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
57 y = numpy.load(os.path.join(path,'test_labels.npy')) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
58 else: |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
59 y = None |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
60 rval.test = Dataset.Obj(x=x, y=y) |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
61 |
490616262500
Adding datasets used in Hugo's NADE paper. Datasets have been converted from
gdesjardins
parents:
diff
changeset
|
62 return rval |