Mercurial > ift6266
annotate data_generation/pipeline/filter_nist.py @ 633:13baba8a4522
merge
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 19 Mar 2011 22:51:40 -0400 |
parents | 75dbbe409578 |
children |
rev | line source |
---|---|
626
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
1 import numpy |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
2 from pylearn.io import filetensor as ft |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
3 from ift6266 import datasets |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
4 from ift6266.datasets.ftfile import FTDataSet |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
5 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
6 dataset_str = 'P07_' # NISTP # 'P07safe_' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
7 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
8 #base_path = '/data/lisatmp/ift6266h10/data/'+dataset_str |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
9 #base_output_path = '/data/lisatmp/ift6266h10/data/transformed_digits/'+dataset_str+'train' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
10 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
11 base_path = '/data/lisa/data/ift6266h10/data/'+dataset_str |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
12 base_output_path = '/data/lisatmp/ift6266h10/data/transformed_digits/'+dataset_str+'train' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
13 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
14 for fileno in range(100): |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
15 print "Processing file no ", fileno |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
16 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
17 output_data_file = base_output_path+str(fileno)+'_data.ft' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
18 output_labels_file = base_output_path+str(fileno)+'_labels.ft' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
19 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
20 print "Reading from ",base_path+'train'+str(fileno)+'_data.ft' |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
21 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
22 dataset = lambda maxsize=None, min_file=0, max_file=100: \ |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
23 FTDataSet(train_data = [base_path+'train'+str(fileno)+'_data.ft'], |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
24 train_lbl = [base_path+'train'+str(fileno)+'_labels.ft'], |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
25 test_data = [base_path+'_test_data.ft'], |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
26 test_lbl = [base_path+'_test_labels.ft'], |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
27 valid_data = [base_path+'_valid_data.ft'], |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
28 valid_lbl = [base_path+'_valid_labels.ft']) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
29 # no conversion or scaling... keep data as is |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
30 #indtype=theano.config.floatX, inscale=255., maxsize=maxsize) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
31 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
32 ds = dataset() |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
33 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
34 all_x = [] |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
35 all_y = [] |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
36 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
37 all_count = 0 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
38 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
39 for mb_x,mb_y in ds.train(1): |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
40 if mb_y[0] <= 9: |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
41 all_x.append(mb_x[0]) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
42 all_y.append(mb_y[0]) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
43 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
44 if (all_count+1) % 100000 == 0: |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
45 print "Done next 100k" |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
46 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
47 all_count += 1 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
48 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
49 # data is stored as uint8 on 0-255 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
50 merged_x = numpy.asarray(all_x, dtype=numpy.uint8) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
51 merged_y = numpy.asarray(all_y, dtype=numpy.int32) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
52 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
53 print "Kept", len(all_x), "(shape ", merged_x.shape, ") examples from", all_count |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
54 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
55 f = open(output_data_file, 'wb') |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
56 ft.write(f, merged_x) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
57 f.close() |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
58 |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
59 f = open(output_labels_file, 'wb') |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
60 ft.write(f, merged_y) |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
61 f.close() |
75dbbe409578
Added code for deep mlp, experiment code to go along with it. Also added code I used to filter the P07 / PNIST07 datasets to keep only digits.
fsavard
parents:
diff
changeset
|
62 |