Mercurial > pylearn
annotate pylearn/datasets/nist_sd.py @ 1524:9d21919e2332
autopep8
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Fri, 02 Nov 2012 13:02:18 -0400 |
parents | 2e87264493ef |
children |
rev | line source |
---|---|
895
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
1 """ |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
2 Provides a Dataset to access the nist digits_reshuffled dataset. |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
3 """ |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
4 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
5 import os, numpy |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
6 from pylearn.io import filetensor as ft |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
7 from pylearn.datasets.config import data_root # config |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
8 from pylearn.datasets.dataset import Dataset |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
9 |
899
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
10 def nist_to_float_11(x): |
898
cdbfdbf7ec56
Nist SD preproc: ensuring upcast to float during centering.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
897
diff
changeset
|
11 return (x - 128.0)/ 128.0 |
895
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
12 |
899
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
13 def nist_to_float_01(x): |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
14 return x / 255.0 |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
15 |
895
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
16 def load(dataset = 'train', attribute = 'data'): |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
17 """Load the filetensor corresponding to the set and attribute. |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
18 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
19 :param dataset: str that is 'train', 'valid' or 'test' |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
20 :param attribute: str that is 'data' or 'labels' |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
21 """ |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
22 fn = 'digits_reshuffled_' + dataset + '_' + attribute + '.ft' |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
23 fn = os.path.join(data_root(), 'nist', 'by_class', 'digits_reshuffled', fn) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
24 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
25 fd = open(fn) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
26 data = ft.read(fd) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
27 fd.close() |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
28 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
29 return data |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
30 |
899
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
31 def train_valid_test(ntrain=285661, nvalid=58646, ntest=58646, path=None, |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
32 range = '01'): |
895
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
33 """ |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
34 Load the nist reshuffled digits dataset as a Dataset. |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
35 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
36 @note: the examples are uint8 and the labels are int32. |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
37 @todo: possibility of loading part of the data. |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
38 """ |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
39 rval = Dataset() |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
40 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
41 # |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
42 rval.n_classes = 10 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
43 rval.img_shape = (32,32) |
899
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
44 |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
45 if range == '01': |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
46 rval.preprocess = nist_to_float_01 |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
47 elif range == '11': |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
48 rval.preprocess = nist_to_float_11 |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
49 else: |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
50 raise ValueError('Nist SD dataset does not support range = %s' % range) |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
51 print "Nist SD dataset: using preproc will provide inputs in the %s range." \ |
2e87264493ef
NistSD dataset: Add range argument for input [-1,1] or [0,1].
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
898
diff
changeset
|
52 % range |
895
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
53 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
54 # train |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
55 examples = load(dataset = 'train', attribute = 'data') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
56 labels = load(dataset = 'train', attribute = 'labels') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
57 rval.train = Dataset.Obj(x=examples[:ntrain], y=labels[:ntrain]) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
58 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
59 # valid |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
60 examples = load(dataset = 'valid', attribute = 'data') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
61 labels = load(dataset = 'valid', attribute = 'labels') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
62 rval.valid = Dataset.Obj(x=examples[:nvalid], y=labels[:nvalid]) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
63 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
64 # test |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
65 examples = load(dataset = 'test', attribute = 'data') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
66 labels = load(dataset = 'test', attribute = 'labels') |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
67 rval.test = Dataset.Obj(x=examples[:ntest], y=labels[:ntest]) |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
68 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
69 return rval |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
70 |
257a39cce72c
Provides a ``Dataset`` for the nist reshuffled digits dataset.
Pierre-Antoine Manzagol <pierre.antoine.manzagol@gmail.com>
parents:
diff
changeset
|
71 |