annotate datasets/dsetiter.py @ 244:39421555993f

small change in gimp_script to avoid blurring in PNIST
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Tue, 16 Mar 2010 12:12:20 -0400
parents 0d0677773533
children 1adfafdc3d57
rev   line source
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 179
diff changeset
1 import numpy
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
2
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
3 class DummyFile(object):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
4 def __init__(self, size):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
5 self.size = size
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
6
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
7 def read(self, num):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
8 if num > self.size:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
9 num = self.size
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
10 self.size -= num
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
11 return numpy.zeros((num, 3, 2))
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
12
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
13 class DataIterator(object):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
14
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
15 def __init__(self, files, batchsize, bufsize=None):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
16 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
17 Makes an iterator which will read examples from `files`
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
18 and return them in `batchsize` lots.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
19
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
20 Parameters:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
21 files -- list of numpy readers
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
22 batchsize -- (int) the size of returned batches
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
23 bufsize -- (int, default=None) internal read buffer size.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
24
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
25 Tests:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
26 >>> d = DataIterator([DummyFile(930)], 10, 100)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
27 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
28 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
29 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
30 100
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
31 >>> d = DataIterator([DummyFile(1)], 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
32 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
33 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
34 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
35 10000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
36 >>> d = DataIterator([DummyFile(1)], 99)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
37 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
38 99
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
39 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
40 9999
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
41 >>> d = DataIterator([DummyFile(1)], 10, 121)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
42 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
43 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
44 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
45 120
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
46 >>> d = DataIterator([DummyFile(1)], 10, 1)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
47 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
48 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
49 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
50 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
51 >>> d = DataIterator([DummyFile(1)], 2000)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
52 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
53 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
54 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
55 20000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
56 >>> d = DataIterator([DummyFile(1)], 2000, 31254)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
57 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
58 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
59 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
60 30000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
61 >>> d = DataIterator([DummyFile(1)], 2000, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
62 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
63 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
64 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
65 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
66 """
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
67 self.batchsize = batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
68 if bufsize is None:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
69 self.bufsize = max(10*batchsize, 10000)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
70 else:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
71 self.bufsize = bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
72 self.bufsize -= self.bufsize % self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
73 if self.bufsize < self.batchsize:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
74 self.bufsize = self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
75 self.files = iter(files)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
76 self.curfile = self.files.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
77 self.empty = False
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
78 self._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
79
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
80 def _fill_buf(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
81 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
82 Fill the internal buffer.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
83
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
84 Will fill across files in case the current one runs out.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
85
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
86 Test:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
87 >>> d = DataIterator([DummyFile(20)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
88 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
89 >>> d.curpos
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
90 0
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
91 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
92 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
93 >>> d = DataIterator([DummyFile(11), DummyFile(9)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
94 >>> d._fill_buf()
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
95 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
96 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
97 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
98 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
99 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
100 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
101 >>> d = DataIterator([DummyFile(10), DummyFile(9)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
102 >>> d._fill_buf()
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
103 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
104 9
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
105 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
106 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
107 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
108 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
109 """
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
110 self.buffer = None
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
111 if self.empty:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
112 raise StopIteration
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
113 buf = self.curfile.read(self.bufsize)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
114
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
115 while len(buf) < self.bufsize:
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
116 try:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
117 self.curfile = self.files.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
118 except StopIteration:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
119 self.empty = True
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
120 if len(buf) == 0:
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
121 raise
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
122 break
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
123 tmpbuf = self.curfile.read(self.bufsize - len(buf))
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
124 buf = numpy.row_stack((buf, tmpbuf))
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
125
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
126 self.cursize = len(buf)
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 179
diff changeset
127 self.buffer = buf
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
128 self.curpos = 0
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
129
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
130 def __next__(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
131 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
132 Returns the next portion of the dataset.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
133
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
134 Test:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
135 >>> d = DataIterator([DummyFile(20)], 10, 20)
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
136 >>> len(d.next())
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
137 10
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
138 >>> len(d.next())
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
139 10
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
140 >>> d.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
141 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
142 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
143 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
144 >>> d.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
145 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
146 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
147 StopIteration
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
148 >>> d = DataIterator([DummyFile(13)], 10, 50)
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
149 >>> len(d.next())
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
150 10
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
151 >>> len(d.next())
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
152 3
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
153 >>> d.next()
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
154 Traceback (most recent call last):
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
155 ...
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
156 StopIteration
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
157 """
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
158 if self.curpos >= self.cursize:
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
159 self._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
160 res = self.buffer[self.curpos:self.curpos+self.batchsize]
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
161 self.curpos += self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
162 return res
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
164 next = __next__
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
165
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
166 def __iter__(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
167 return self