annotate pylearn/datasets/MNIST.py @ 1484:83d3c9ee6d65

* changed MNIST dataset to use config.get_filepath_in_roots mechanism
author gdesjardins
date Tue, 05 Jul 2011 11:01:51 -0400
parents 6ade5b39b773
children
rev   line source
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
1 """
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
2 Various routines to load/access MNIST data.
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
3 """
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
4
505
74b3e65f5f24 added smallNorb dataset, switched to PYLEARN_DATA_ROOT
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 504
diff changeset
5 import os
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
6 import numpy
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
7
818
f4729745bb58 backporting to 2.4
dumitru@deepnets.mtv.corp.google.com
parents: 795
diff changeset
8 from pylearn.io.pmat import PMat
f4729745bb58 backporting to 2.4
dumitru@deepnets.mtv.corp.google.com
parents: 795
diff changeset
9 from pylearn.datasets.dataset import Dataset
1484
83d3c9ee6d65 * changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents: 1403
diff changeset
10 import config
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
11
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
12 def head(n=10, path=None):
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
13 """Load the first MNIST examples.
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
14
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
15 Returns two matrices: x, y. x has N rows of 784 columns. Each row of x represents the
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
16 28x28 grey-scale pixels in raster order. y is a vector of N integers. Each element y[i]
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
17 is the label of the i'th row of x.
1403
6ade5b39b773 int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents: 892
diff changeset
18
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
19 """
818
f4729745bb58 backporting to 2.4
dumitru@deepnets.mtv.corp.google.com
parents: 795
diff changeset
20 if path is None:
1484
83d3c9ee6d65 * changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents: 1403
diff changeset
21 # dataset lookup through $PYLEARN_DATA_ROOT
83d3c9ee6d65 * changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents: 1403
diff changeset
22 _path = os.path.join('mnist', 'mnist_all.pmat')
83d3c9ee6d65 * changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents: 1403
diff changeset
23 path = config.get_filepath_in_roots(_path)
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
24
794
951272679910 get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 658
diff changeset
25 dat = PMat(fname=path)
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
26
794
951272679910 get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 658
diff changeset
27 rows=dat.getRows(0,n)
504
19ab9ce916e3 slightly more sophisticated system for finding the mnist data
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 475
diff changeset
28
1403
6ade5b39b773 int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents: 892
diff changeset
29 return rows[:,0:-1], numpy.asarray(rows[:,-1], dtype='int8')
470
bd937e845bbb new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
30
795
f30bb746f279 put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 794
diff changeset
31
f30bb746f279 put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 794
diff changeset
32 #What is the purpose of this fct?
f30bb746f279 put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 794
diff changeset
33 #If still usefull, rename it as it conflict with the python an numpy nake all.
f30bb746f279 put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 794
diff changeset
34 #def all(path=None):
f30bb746f279 put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 794
diff changeset
35 # return head(n=None, path=path)
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
36
475
11e0357f06f4 typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 471
diff changeset
37 def train_valid_test(ntrain=50000, nvalid=10000, ntest=10000, path=None):
11e0357f06f4 typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 471
diff changeset
38 all_x, all_targ = head(ntrain+nvalid+ntest, path=path)
471
45b3eb429c15 added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 470
diff changeset
39
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
40 rval = Dataset()
471
45b3eb429c15 added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 470
diff changeset
41
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
42 rval.train = Dataset.Obj(x=all_x[0:ntrain],
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
43 y=all_targ[0:ntrain])
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
44 rval.valid = Dataset.Obj(x=all_x[ntrain:ntrain+nvalid],
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
45 y=all_targ[ntrain:ntrain+nvalid])
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
46 rval.test = Dataset.Obj(x=all_x[ntrain+nvalid:ntrain+nvalid+ntest],
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
47 y=all_targ[ntrain+nvalid:ntrain+nvalid+ntest])
471
45b3eb429c15 added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 470
diff changeset
48
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
49 rval.n_classes = 10
563
16f91ca016b1 * added NStages as a stopper (moved from hpu/conv)
desjagui@atchoum.iro.umontreal.ca
parents: 537
diff changeset
50 rval.img_shape = (28,28)
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
51 return rval
475
11e0357f06f4 typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 471
diff changeset
52
11e0357f06f4 typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 471
diff changeset
53
627
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
54 def full():
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
55 return train_valid_test()
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 511
diff changeset
56
829
3f44379177b2 More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents: 824
diff changeset
57 #useful for test, keep it
658
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
58 def first_10():
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
59 return train_valid_test(ntrain=10, nvalid=10, ntest=10)
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
60
829
3f44379177b2 More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents: 824
diff changeset
61 #useful for test, keep it
658
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
62 def first_100():
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
63 return train_valid_test(ntrain=100, nvalid=100, ntest=100)
6d927441a38f added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 653
diff changeset
64
627
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
65 def first_1k():
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
66 return train_valid_test(ntrain=1000, nvalid=200, ntest=200)
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
67
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
68 def first_10k():
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
69 return train_valid_test(ntrain=10000, nvalid=2000, ntest=2000)
ec27e19bb6eb moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 563
diff changeset
70