Mercurial > ift6266
annotate datasets/dsetiter.py @ 613:5e481b224117
fix the reading of PNIST dataset following Dumi compression of the data.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Thu, 06 Jan 2011 13:57:05 -0500 |
parents | 1adfafdc3d57 |
children |
rev | line source |
---|---|
180
76bc047df5ee
Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents:
179
diff
changeset
|
1 import numpy |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
2 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
3 class DummyFile(object): |
302
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
4 def __init__(self, size, shape=()): |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
5 self.size = size |
302
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
6 self.shape = shape |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
7 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
8 def read(self, num): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
9 if num > self.size: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
10 num = self.size |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
11 self.size -= num |
302
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
12 return numpy.zeros((num,)+self.shape) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
13 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
14 class DataIterator(object): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
15 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
16 def __init__(self, files, batchsize, bufsize=None): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
17 r""" |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
18 Makes an iterator which will read examples from `files` |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
19 and return them in `batchsize` lots. |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
20 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
21 Parameters: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
22 files -- list of numpy readers |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
23 batchsize -- (int) the size of returned batches |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
24 bufsize -- (int, default=None) internal read buffer size. |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
25 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
26 Tests: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
27 >>> d = DataIterator([DummyFile(930)], 10, 100) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
28 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
29 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
30 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
31 100 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
32 >>> d = DataIterator([DummyFile(1)], 10) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
33 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
34 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
35 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
36 10000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
37 >>> d = DataIterator([DummyFile(1)], 99) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
38 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
39 99 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
40 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
41 9999 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
42 >>> d = DataIterator([DummyFile(1)], 10, 121) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
43 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
44 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
45 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
46 120 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
47 >>> d = DataIterator([DummyFile(1)], 10, 1) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
48 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
49 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
50 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
51 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
52 >>> d = DataIterator([DummyFile(1)], 2000) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
53 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
54 2000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
55 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
56 20000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
57 >>> d = DataIterator([DummyFile(1)], 2000, 31254) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
58 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
59 2000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
60 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
61 30000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
62 >>> d = DataIterator([DummyFile(1)], 2000, 10) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
63 >>> d.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
64 2000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
65 >>> d.bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
66 2000 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
67 """ |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
68 self.batchsize = batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
69 if bufsize is None: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
70 self.bufsize = max(10*batchsize, 10000) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
71 else: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
72 self.bufsize = bufsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
73 self.bufsize -= self.bufsize % self.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
74 if self.bufsize < self.batchsize: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
75 self.bufsize = self.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
76 self.files = iter(files) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
77 self.curfile = self.files.next() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
78 self.empty = False |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
79 self._fill_buf() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
80 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
81 def _fill_buf(self): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
82 r""" |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
83 Fill the internal buffer. |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
84 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
85 Will fill across files in case the current one runs out. |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
86 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
87 Test: |
302
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
88 >>> d = DataIterator([DummyFile(20, (3,2))], 10, 10) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
89 >>> d._fill_buf() |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
90 >>> d.curpos |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
91 0 |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
92 >>> len(d.buffer) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
93 10 |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
94 >>> d = DataIterator([DummyFile(11, (3,2)), DummyFile(9, (3,2))], 10, 10) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
95 >>> d._fill_buf() |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
96 >>> len(d.buffer) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
97 10 |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
98 >>> d._fill_buf() |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
99 Traceback (most recent call last): |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
100 ... |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
101 StopIteration |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
102 >>> d = DataIterator([DummyFile(10, (3,2)), DummyFile(9, (3,2))], 10, 10) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
103 >>> d._fill_buf() |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
104 >>> len(d.buffer) |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
105 9 |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
106 >>> d._fill_buf() |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
107 Traceback (most recent call last): |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
108 ... |
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
109 StopIteration |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
110 >>> d = DataIterator([DummyFile(20)], 10, 10) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
111 >>> d._fill_buf() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
112 >>> d.curpos |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
113 0 |
179
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
114 >>> len(d.buffer) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
115 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
116 >>> d = DataIterator([DummyFile(11), DummyFile(9)], 10, 10) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
117 >>> d._fill_buf() |
179
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
118 >>> len(d.buffer) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
119 10 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
120 >>> d._fill_buf() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
121 Traceback (most recent call last): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
122 ... |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
123 StopIteration |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
124 >>> d = DataIterator([DummyFile(10), DummyFile(9)], 10, 10) |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
125 >>> d._fill_buf() |
179
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
126 >>> len(d.buffer) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
127 9 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
128 >>> d._fill_buf() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
129 Traceback (most recent call last): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
130 ... |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
131 StopIteration |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
132 """ |
178
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
133 self.buffer = None |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
134 if self.empty: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
135 raise StopIteration |
178
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
136 buf = self.curfile.read(self.bufsize) |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
137 |
178
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
138 while len(buf) < self.bufsize: |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
139 try: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
140 self.curfile = self.files.next() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
141 except StopIteration: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
142 self.empty = True |
178
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
143 if len(buf) == 0: |
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
144 raise |
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
145 break |
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
146 tmpbuf = self.curfile.read(self.bufsize - len(buf)) |
302
1adfafdc3d57
Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents:
189
diff
changeset
|
147 buf = numpy.concatenate([buf, tmpbuf], axis=0) |
178
938bd350dbf0
Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents:
163
diff
changeset
|
148 |
189
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
149 self.cursize = len(buf) |
180
76bc047df5ee
Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents:
179
diff
changeset
|
150 self.buffer = buf |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
151 self.curpos = 0 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
152 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
153 def __next__(self): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
154 r""" |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
155 Returns the next portion of the dataset. |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
156 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
157 Test: |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
158 >>> d = DataIterator([DummyFile(20)], 10, 20) |
179
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
159 >>> len(d.next()) |
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
160 10 |
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
161 >>> len(d.next()) |
defd388aba0c
Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents:
178
diff
changeset
|
162 10 |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
163 >>> d.next() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
164 Traceback (most recent call last): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
165 ... |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
166 StopIteration |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
167 >>> d.next() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
168 Traceback (most recent call last): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
169 ... |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
170 StopIteration |
189
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
171 >>> d = DataIterator([DummyFile(13)], 10, 50) |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
172 >>> len(d.next()) |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
173 10 |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
174 >>> len(d.next()) |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
175 3 |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
176 >>> d.next() |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
177 Traceback (most recent call last): |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
178 ... |
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
179 StopIteration |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
180 """ |
189
0d0677773533
Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents:
180
diff
changeset
|
181 if self.curpos >= self.cursize: |
163
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
182 self._fill_buf() |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
183 res = self.buffer[self.curpos:self.curpos+self.batchsize] |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
184 self.curpos += self.batchsize |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
185 return res |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
186 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
187 next = __next__ |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
188 |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
189 def __iter__(self): |
4b28d7382dbf
Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff
changeset
|
190 return self |