Mercurial > ift6266
annotate datasets/defs.py @ 266:1e4e60ddadb1
Merge. Ah, et dans le dernier commit, j'avais oublié de mentionner que j'ai ajouté du code pour gérer l'isolation de différents clones pour rouler des expériences et modifier le code en même temps.
author | fsavard |
---|---|
date | Fri, 19 Mar 2010 10:56:16 -0400 |
parents | 966272e7f14b |
children | 4533350d7361 |
rev | line source |
---|---|
211
476da2ba6a12
Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents:
181
diff
changeset
|
1 __all__ = ['nist_digits', 'nist_lower', 'nist_upper', 'nist_all', 'ocr', |
222
4cfd0eb438af
Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents:
211
diff
changeset
|
2 'nist_P07', 'mnist'] |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
3 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
4 from ftfile import FTDataSet |
222
4cfd0eb438af
Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents:
211
diff
changeset
|
5 from gzpklfile import GzpklDataSet |
180
76bc047df5ee
Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents:
175
diff
changeset
|
6 import theano |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
7 import os |
175
224321bf043a
Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents:
164
diff
changeset
|
8 |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
9 # if the environmental variables exist, get the path from them, |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
10 # otherwise fall back on the default |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
11 NIST_PATH = os.getenv('NIST_PATH','/data/lisa/data/nist/by_class/') |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
12 DATA_PATH = os.getenv('DATA_PATH','/data/lisa/data/ift6266h10/') |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
13 |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
14 nist_digits = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'digits/digits_train_data.ft')], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
15 train_lbl = [os.path.join(NIST_PATH,'digits/digits_train_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
16 test_data = [os.path.join(NIST_PATH,'digits/digits_test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
17 test_lbl = [os.path.join(NIST_PATH,'digits/digits_test_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
18 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
19 nist_lower = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'lower/lower_train_data.ft')], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
20 train_lbl = [os.path.join(NIST_PATH,'lower/lower_train_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
21 test_data = [os.path.join(NIST_PATH,'lower/lower_test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
22 test_lbl = [os.path.join(NIST_PATH,'lower/lower_test_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
23 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
24 nist_upper = lambda maxsize=None: FTDataSet(train_data = [os.path.join(NIST_PATH,'upper/upper_train_data.ft')], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
25 train_lbl = [os.path.join(NIST_PATH,'upper/upper_train_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
26 test_data = [os.path.join(NIST_PATH,'upper/upper_test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
27 test_lbl = [os.path.join(NIST_PATH,'upper/upper_test_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
28 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
29 |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
30 nist_all = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'train_data.ft')], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
31 train_lbl = [os.path.join(DATA_PATH,'train_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
32 test_data = [os.path.join(DATA_PATH,'test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
33 test_lbl = [os.path.join(DATA_PATH,'test_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
34 valid_data = [os.path.join(DATA_PATH,'valid_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
35 valid_lbl = [os.path.join(DATA_PATH,'valid_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
36 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
175
224321bf043a
Define the ocr dataset and use the existing split for nist.
Arnaud Bergeron <abergeron@gmail.com>
parents:
164
diff
changeset
|
37 |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
38 ocr = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'ocr_train_data.ft')], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
39 train_lbl = [os.path.join(DATA_PATH,'ocr_train_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
40 test_data = [os.path.join(DATA_PATH,'ocr_test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
41 test_lbl = [os.path.join(DATA_PATH,'ocr_test_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
42 valid_data = [os.path.join(DATA_PATH,'ocr_valid_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
43 valid_lbl = [os.path.join(DATA_PATH,'ocr_valid_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
44 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
211
476da2ba6a12
Add nist_P07 datasets to the predefs.
Arnaud Bergeron <abergeron@gmail.com>
parents:
181
diff
changeset
|
45 |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
46 nist_P07 = lambda maxsize=None: FTDataSet(train_data = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_data.ft') for i in range(100)], |
231
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
47 train_lbl = [os.path.join(DATA_PATH,'data/P07_train'+str(i)+'_labels.ft') for i in range(100)], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
48 test_data = [os.path.join(DATA_PATH,'data/P07_test_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
49 test_lbl = [os.path.join(DATA_PATH,'data/P07_test_labels.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
50 valid_data = [os.path.join(DATA_PATH,'data/P07_valid_data.ft')], |
6f4e3719a3cc
Added the possibility to get the paths from an env. variable + cleaned up the way we build the paths
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
222
diff
changeset
|
51 valid_lbl = [os.path.join(DATA_PATH,'data/P07_valid_labels.ft')], |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
52 indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
222
4cfd0eb438af
Add mnist to datasets (and supporting code).
Arnaud Bergeron <abergeron@gmail.com>
parents:
211
diff
changeset
|
53 |
257
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
54 mnist = lambda maxsize=None: GzpklDataSet(os.path.join(DATA_PATH,'mnist.pkl.gz'), |
966272e7f14b
Make the datasets lazy-loading and add a maxsize parameter.
Arnaud Bergeron <abergeron@gmail.com>
parents:
231
diff
changeset
|
55 maxsize=maxsize) |