Mercurial > pylearn
annotate pylearn/datasets/MNIST.py @ 1403:6ade5b39b773
int8 should be enough to represent digits from 0 to 9
author | Pascal Lamblin <lamblinp@iro.umontreal.ca> |
---|---|
date | Fri, 21 Jan 2011 20:40:57 -0500 |
parents | a13142cbeabd |
children | 83d3c9ee6d65 |
rev | line source |
---|---|
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
1 """ |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
2 Various routines to load/access MNIST data. |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 """ |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
505
74b3e65f5f24
added smallNorb dataset, switched to PYLEARN_DATA_ROOT
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
504
diff
changeset
|
5 import os |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 import numpy |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
818 | 8 from pylearn.io.pmat import PMat |
9 from pylearn.datasets.config import data_root # config | |
10 from pylearn.datasets.dataset import Dataset | |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 def head(n=10, path=None): |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 """Load the first MNIST examples. |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 Returns two matrices: x, y. x has N rows of 784 columns. Each row of x represents the |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 28x28 grey-scale pixels in raster order. y is a vector of N integers. Each element y[i] |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 is the label of the i'th row of x. |
1403
6ade5b39b773
int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
892
diff
changeset
|
18 |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 """ |
818 | 20 if path is None: |
21 path = os.path.join(data_root(), 'mnist','mnist_all.pmat') | |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
22 |
794
951272679910
get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
658
diff
changeset
|
23 dat = PMat(fname=path) |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 |
794
951272679910
get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
658
diff
changeset
|
25 rows=dat.getRows(0,n) |
504
19ab9ce916e3
slightly more sophisticated system for finding the mnist data
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
475
diff
changeset
|
26 |
1403
6ade5b39b773
int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
892
diff
changeset
|
27 return rows[:,0:-1], numpy.asarray(rows[:,-1], dtype='int8') |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
28 |
795
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
29 |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
30 #What is the purpose of this fct? |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
31 #If still usefull, rename it as it conflict with the python an numpy nake all. |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
32 #def all(path=None): |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
33 # return head(n=None, path=path) |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
34 |
475
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
35 def train_valid_test(ntrain=50000, nvalid=10000, ntest=10000, path=None): |
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
36 all_x, all_targ = head(ntrain+nvalid+ntest, path=path) |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
37 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
38 rval = Dataset() |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
39 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
40 rval.train = Dataset.Obj(x=all_x[0:ntrain], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
41 y=all_targ[0:ntrain]) |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
42 rval.valid = Dataset.Obj(x=all_x[ntrain:ntrain+nvalid], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
43 y=all_targ[ntrain:ntrain+nvalid]) |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
44 rval.test = Dataset.Obj(x=all_x[ntrain+nvalid:ntrain+nvalid+ntest], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
45 y=all_targ[ntrain+nvalid:ntrain+nvalid+ntest]) |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
46 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
47 rval.n_classes = 10 |
563
16f91ca016b1
* added NStages as a stopper (moved from hpu/conv)
desjagui@atchoum.iro.umontreal.ca
parents:
537
diff
changeset
|
48 rval.img_shape = (28,28) |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
49 return rval |
475
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
50 |
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
51 |
627
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
52 def full(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
53 return train_valid_test() |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
54 |
829
3f44379177b2
More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
824
diff
changeset
|
55 #useful for test, keep it |
658
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
56 def first_10(): |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
57 return train_valid_test(ntrain=10, nvalid=10, ntest=10) |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
58 |
829
3f44379177b2
More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
824
diff
changeset
|
59 #useful for test, keep it |
658
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
60 def first_100(): |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
61 return train_valid_test(ntrain=100, nvalid=100, ntest=100) |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
62 |
627
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
63 def first_1k(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
64 return train_valid_test(ntrain=1000, nvalid=200, ntest=200) |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
65 |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
66 def first_10k(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
67 return train_valid_test(ntrain=10000, nvalid=2000, ntest=2000) |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
68 |