annotate pylearn/datasets/nist_sd.py @ 1524:9d21919e2332

autopep8
author Frederic Bastien <nouiz@nouiz.org>
date Fri, 02 Nov 2012 13:02:18 -0400
parents 2e87264493ef
children
rev   line source
895
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
1 """
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
2 Provides a Dataset to access the nist digits_reshuffled dataset.
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
3 """
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
4
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
5 import os, numpy
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
6 from pylearn.io import filetensor as ft
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
7 from pylearn.datasets.config import data_root # config
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
8 from pylearn.datasets.dataset import Dataset
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
9
899
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
10 def nist_to_float_11(x):
898
cdbfdbf7ec56 Nist SD preproc: ensuring upcast to float during centering.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 897
diff changeset
11 return (x - 128.0)/ 128.0
895
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
12
899
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
13 def nist_to_float_01(x):
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
14 return x / 255.0
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
15
895
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
16 def load(dataset = 'train', attribute = 'data'):
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
17 """Load the filetensor corresponding to the set and attribute.
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
18
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
19 :param dataset: str that is 'train', 'valid' or 'test'
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
20 :param attribute: str that is 'data' or 'labels'
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
21 """
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
22 fn = 'digits_reshuffled_' + dataset + '_' + attribute + '.ft'
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
23 fn = os.path.join(data_root(), 'nist', 'by_class', 'digits_reshuffled', fn)
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
24
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
25 fd = open(fn)
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
26 data = ft.read(fd)
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
27 fd.close()
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
28
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
29 return data
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
30
899
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
31 def train_valid_test(ntrain=285661, nvalid=58646, ntest=58646, path=None,
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
32 range = '01'):
895
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
33 """
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
34 Load the nist reshuffled digits dataset as a Dataset.
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
35
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
36 @note: the examples are uint8 and the labels are int32.
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
37 @todo: possibility of loading part of the data.
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
38 """
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
39 rval = Dataset()
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
40
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
41 #
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
42 rval.n_classes = 10
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
43 rval.img_shape = (32,32)
899
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
44
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
45 if range == '01':
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
46 rval.preprocess = nist_to_float_01
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
47 elif range == '11':
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
48 rval.preprocess = nist_to_float_11
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
49 else:
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
50 raise ValueError('Nist SD dataset does not support range = %s' % range)
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
51 print "Nist SD dataset: using preproc will provide inputs in the %s range." \
2e87264493ef NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents: 898
diff changeset
52 % range
895
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
53
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
54 # train
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
55 examples = load(dataset = 'train', attribute = 'data')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
56 labels = load(dataset = 'train', attribute = 'labels')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
57 rval.train = Dataset.Obj(x=examples[:ntrain], y=labels[:ntrain])
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
58
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
59 # valid
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
60 examples = load(dataset = 'valid', attribute = 'data')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
61 labels = load(dataset = 'valid', attribute = 'labels')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
62 rval.valid = Dataset.Obj(x=examples[:nvalid], y=labels[:nvalid])
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
63
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
64 # test
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
65 examples = load(dataset = 'test', attribute = 'data')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
66 labels = load(dataset = 'test', attribute = 'labels')
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
67 rval.test = Dataset.Obj(x=examples[:ntest], y=labels[:ntest])
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
68
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
69 return rval
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
70
257a39cce72c Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff changeset
71