Mercurial > pylearn
annotate pylearn/datasets/MNIST.py @ 1531:88f361283a19 tip
Fix url/name to pylearn2.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Mon, 09 Sep 2013 10:08:05 -0400 |
parents | 83d3c9ee6d65 |
children |
rev | line source |
---|---|
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
1 """ |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
2 Various routines to load/access MNIST data. |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
3 """ |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
4 |
505
74b3e65f5f24
added smallNorb dataset, switched to PYLEARN_DATA_ROOT
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
504
diff
changeset
|
5 import os |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
6 import numpy |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
818 | 8 from pylearn.io.pmat import PMat |
9 from pylearn.datasets.dataset import Dataset | |
1484
83d3c9ee6d65
* changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents:
1403
diff
changeset
|
10 import config |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
11 |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
12 def head(n=10, path=None): |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
13 """Load the first MNIST examples. |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
14 |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
15 Returns two matrices: x, y. x has N rows of 784 columns. Each row of x represents the |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
16 28x28 grey-scale pixels in raster order. y is a vector of N integers. Each element y[i] |
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
17 is the label of the i'th row of x. |
1403
6ade5b39b773
int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
892
diff
changeset
|
18 |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
19 """ |
818 | 20 if path is None: |
1484
83d3c9ee6d65
* changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents:
1403
diff
changeset
|
21 # dataset lookup through $PYLEARN_DATA_ROOT |
83d3c9ee6d65
* changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents:
1403
diff
changeset
|
22 _path = os.path.join('mnist', 'mnist_all.pmat') |
83d3c9ee6d65
* changed MNIST dataset to use config.get_filepath_in_roots mechanism
gdesjardins
parents:
1403
diff
changeset
|
23 path = config.get_filepath_in_roots(_path) |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
24 |
794
951272679910
get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
658
diff
changeset
|
25 dat = PMat(fname=path) |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
26 |
794
951272679910
get the mnist data from the pmat file and not the amat file
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
658
diff
changeset
|
27 rows=dat.getRows(0,n) |
504
19ab9ce916e3
slightly more sophisticated system for finding the mnist data
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
475
diff
changeset
|
28 |
1403
6ade5b39b773
int8 should be enough to represent digits from 0 to 9
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
892
diff
changeset
|
29 return rows[:,0:-1], numpy.asarray(rows[:,-1], dtype='int8') |
470
bd937e845bbb
new stuff: algorithms/logistic_regression, datasets/MNIST
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff
changeset
|
30 |
795
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
31 |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
32 #What is the purpose of this fct? |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
33 #If still usefull, rename it as it conflict with the python an numpy nake all. |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
34 #def all(path=None): |
f30bb746f279
put in comment a fct call all that seam to don't be used and conflict with numpy and python name.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
794
diff
changeset
|
35 # return head(n=None, path=path) |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
36 |
475
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
37 def train_valid_test(ntrain=50000, nvalid=10000, ntest=10000, path=None): |
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
38 all_x, all_targ = head(ntrain+nvalid+ntest, path=path) |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
39 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
40 rval = Dataset() |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
41 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
42 rval.train = Dataset.Obj(x=all_x[0:ntrain], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
43 y=all_targ[0:ntrain]) |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
44 rval.valid = Dataset.Obj(x=all_x[ntrain:ntrain+nvalid], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
45 y=all_targ[ntrain:ntrain+nvalid]) |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
46 rval.test = Dataset.Obj(x=all_x[ntrain+nvalid:ntrain+nvalid+ntest], |
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
47 y=all_targ[ntrain+nvalid:ntrain+nvalid+ntest]) |
471
45b3eb429c15
added train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
470
diff
changeset
|
48 |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
49 rval.n_classes = 10 |
563
16f91ca016b1
* added NStages as a stopper (moved from hpu/conv)
desjagui@atchoum.iro.umontreal.ca
parents:
537
diff
changeset
|
50 rval.img_shape = (28,28) |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
51 return rval |
475
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
52 |
11e0357f06f4
typo in MNIST.train_valid_test
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
471
diff
changeset
|
53 |
627
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
54 def full(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
55 return train_valid_test() |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
511
diff
changeset
|
56 |
829
3f44379177b2
More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
824
diff
changeset
|
57 #useful for test, keep it |
658
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
58 def first_10(): |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
59 return train_valid_test(ntrain=10, nvalid=10, ntest=10) |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
60 |
829
3f44379177b2
More descriptive error message when fpconst is missing.
Pascal Lamblin <lamblinp@iro.umontreal.ca>
parents:
824
diff
changeset
|
61 #useful for test, keep it |
658
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
62 def first_100(): |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
63 return train_valid_test(ntrain=100, nvalid=100, ntest=100) |
6d927441a38f
added pylearn.datasets.MNIST.first_10 and pylearn.datasets.MNIST.first_100. They are usefull to test with small dataset.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
653
diff
changeset
|
64 |
627
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
65 def first_1k(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
66 return train_valid_test(ntrain=1000, nvalid=200, ntest=200) |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
67 |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
68 def first_10k(): |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
69 return train_valid_test(ntrain=10000, nvalid=2000, ntest=2000) |
ec27e19bb6eb
moving away from mnist_factory
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
563
diff
changeset
|
70 |