annotate datasets/defs.py @ 239:42005ec87747

Mergé (manuellement) les changements de Sylvain pour utiliser le code de dataset d'Arnaud, à cette différence près que je n'utilse pas les givens. J'ai probablement une approche différente pour limiter la taille du dataset dans mon débuggage, aussi.
author fsavard
date Mon, 15 Mar 2010 18:30:21 -0400
parents 6f4e3719a3cc
children 966272e7f14b
rev   line source
211
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
1 __all__ = ['nist_digits', 'nist_lower', 'nist_upper', 'nist_all', 'ocr',
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
2 'nist_P07', 'mnist']
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
3
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
4 from ftfile import FTDataSet
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
5 from gzpklfile import GzpklDataSet
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
6 import theano
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
7 import os
175
224321bf043a Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents: 164
diff changeset
8
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
9 # if the environmental variables exist, get the path from them,
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
10 # otherwise fall back on the default
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
11 NIST_PATH = os.getenv('NIST_PATH','/data/lisa/data/nist/by_class/')
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
12 DATA_PATH = os.getenv('DATA_PATH','/data/lisa/data/ift6266h10/')
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
13
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
14 nist_digits = FTDataSet(train_data = [os.path.join(NIST_PATH,'digits/digits_train_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
15 train_lbl = [os.path.join(NIST_PATH,'digits/digits_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
16 test_data = [os.path.join(NIST_PATH,'digits/digits_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
17 test_lbl = [os.path.join(NIST_PATH,'digits/digits_test_labels.ft')],
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
18 indtype=theano.config.floatX, inscale=255.)
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
19 nist_lower = FTDataSet(train_data = [os.path.join(NIST_PATH,'lower/lower_train_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
20 train_lbl = [os.path.join(NIST_PATH,'lower/lower_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
21 test_data = [os.path.join(NIST_PATH,'lower/lower_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
22 test_lbl = [os.path.join(NIST_PATH,'lower/lower_test_labels.ft')],
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
23 indtype=theano.config.floatX, inscale=255.)
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
24 nist_upper = FTDataSet(train_data = [os.path.join(NIST_PATH,'upper/upper_train_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
25 train_lbl = [os.path.join(NIST_PATH,'upper/upper_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
26 test_data = [os.path.join(NIST_PATH,'upper/upper_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
27 test_lbl = [os.path.join(NIST_PATH,'upper/upper_test_labels.ft')],
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
28 indtype=theano.config.floatX, inscale=255.)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
29
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
30 nist_all = FTDataSet(train_data = [os.path.join(DATA_PATH,'train_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
31 train_lbl = [os.path.join(DATA_PATH,'train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
32 test_data = [os.path.join(DATA_PATH,'test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
33 test_lbl = [os.path.join(DATA_PATH,'test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
34 valid_data = [os.path.join(DATA_PATH,'valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
35 valid_lbl = [os.path.join(DATA_PATH,'valid_labels.ft')],
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 175
diff changeset
36 indtype=theano.config.floatX, inscale=255.)
175
224321bf043a Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents: 164
diff changeset
37
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
38 ocr = FTDataSet(train_data = [os.path.join(DATA_PATH,'ocr_train_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
39 train_lbl = [os.path.join(DATA_PATH,'ocr_train_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
40 test_data = [os.path.join(DATA_PATH,'ocr_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
41 test_lbl = [os.path.join(DATA_PATH,'ocr_test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
42 valid_data = [os.path.join(DATA_PATH,'ocr_valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
43 valid_lbl = [os.path.join(DATA_PATH,'ocr_valid_labels.ft')],
211
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
44 indtype=theano.config.floatX, inscale=255.)
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
45
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
46 nist_P07 = FTDataSet(train_data = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_data.ft') for i in range(100)],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
47 train_lbl = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_labels.ft') for i in range(100)],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
48 test_data = [os.path.join(DATA_PATH,'data/P07_test_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
49 test_lbl = [os.path.join(DATA_PATH,'data/P07_test_labels.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
50 valid_data = [os.path.join(DATA_PATH,'data/P07_valid_data.ft')],
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
51 valid_lbl = [os.path.join(DATA_PATH,'data/P07_valid_labels.ft')],
211
476da2ba6a12 Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents: 181
diff changeset
52 indtype=theano.config.floatX, inscale=255.)
222
4cfd0eb438af Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents: 211
diff changeset
53
231
6f4e3719a3cc Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 222
diff changeset
54 mnist = GzpklDataSet(os.path.join(DATA_PATH,'mnist.pkl.gz'))