annotate datasets/defs.py @ 260:0c0f0b3f6a93

branch merge.
author Arnaud Bergeron <abergeron@gmail.com>
date Wed, 17 Mar 2010 15:31:21 -0400
parents 966272e7f14b
children 4533350d7361
rev   line source
211
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
1 __all__ = ['nist_digits', 'nist_lower', 'nist_upper', 'nist_all', 'ocr',
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
2 'nist_P07', 'mnist']
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
3
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
4 from ftfile import FTDataSet
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
5 from gzpklfile import GzpklDataSet
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
6 import theano
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
7 import os
175
224321bf043a Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents: 164
diff changeset
8
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
9 # if the environmental variables exist, get the path from them,
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
10 # otherwise fall back on the default
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
11 NIST_PATH = os.getenv('NIST_PATH','/data/lisa/data/nist/by_class/')
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
12 DATA_PATH = os.getenv('DATA_PATH','/data/lisa/data/ift6266h10/')
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
13
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
14 nist_digits = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'digits/digits_train_data.ft')],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
15 train_lbl = [os.path.join(NIST_PATH,'digits/digits_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
16 test_data = [os.path.join(NIST_PATH,'digits/digits_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
17 test_lbl = [os.path.join(NIST_PATH,'digits/digits_test_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
18 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
19 nist_lower = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'lower/lower_train_data.ft')],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
20 train_lbl = [os.path.join(NIST_PATH,'lower/lower_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
21 test_data = [os.path.join(NIST_PATH,'lower/lower_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
22 test_lbl = [os.path.join(NIST_PATH,'lower/lower_test_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
23 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
24 nist_upper = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'upper/upper_train_data.ft')],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
25 train_lbl = [os.path.join(NIST_PATH,'upper/upper_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
26 test_data = [os.path.join(NIST_PATH,'upper/upper_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
27 test_lbl = [os.path.join(NIST_PATH,'upper/upper_test_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
28 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
29
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
30 nist_all = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'train_data.ft')],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
31 train_lbl = [os.path.join(DATA_PATH,'train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
32 test_data = [os.path.join(DATA_PATH,'test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
33 test_lbl = [os.path.join(DATA_PATH,'test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
34 valid_data = [os.path.join(DATA_PATH,'valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
35 valid_lbl = [os.path.join(DATA_PATH,'valid_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
36 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
175
224321bf043a Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents: 164
diff changeset
37
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
38 ocr = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'ocr_train_data.ft')],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
39 train_lbl = [os.path.join(DATA_PATH,'ocr_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
40 test_data = [os.path.join(DATA_PATH,'ocr_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
41 test_lbl = [os.path.join(DATA_PATH,'ocr_test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
42 valid_data = [os.path.join(DATA_PATH,'ocr_valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
43 valid_lbl = [os.path.join(DATA_PATH,'ocr_valid_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
44 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
211
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
45
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
46 nist_P07 = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_data.ft') for i in range(100)],
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
47 train_lbl = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_labels.ft') for i in range(100)],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
48 test_data = [os.path.join(DATA_PATH,'data/P07_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
49 test_lbl = [os.path.join(DATA_PATH,'data/P07_test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
50 valid_data = [os.path.join(DATA_PATH,'data/P07_valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
51 valid_lbl = [os.path.join(DATA_PATH,'data/P07_valid_labels.ft')],
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
52 indtype=theano.config.floatX, inscale=255., maxsize=maxsize)
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
53
257
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
54 mnist = lambda maxsize=None: GzpklDataSet(os.path.join(DATA_PATH,'mnist.pkl.gz'),
966272e7f14b Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents: 231
diff changeset
55 maxsize=maxsize)