annotate datasets/dsetiter.py @ 595:da46a62ce402

submitted JMLR pdf
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 05 Oct 2010 15:07:33 -0400
parents 1adfafdc3d57
children
rev   line source
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 179
diff changeset
1 import numpy
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
2
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
3 class DummyFile(object):
302
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
4 def __init__(self, size, shape=()):
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
5 self.size = size
302
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
6 self.shape = shape
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
7
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
8 def read(self, num):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
9 if num > self.size:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
10 num = self.size
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
11 self.size -= num
302
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
12 return numpy.zeros((num,)+self.shape)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
13
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
14 class DataIterator(object):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
15
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
16 def __init__(self, files, batchsize, bufsize=None):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
17 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
18 Makes an iterator which will read examples from `files`
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
19 and return them in `batchsize` lots.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
20
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
21 Parameters:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
22 files -- list of numpy readers
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
23 batchsize -- (int) the size of returned batches
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
24 bufsize -- (int, default=None) internal read buffer size.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
25
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
26 Tests:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
27 >>> d = DataIterator([DummyFile(930)], 10, 100)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
28 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
29 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
30 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
31 100
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
32 >>> d = DataIterator([DummyFile(1)], 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
33 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
34 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
35 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
36 10000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
37 >>> d = DataIterator([DummyFile(1)], 99)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
38 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
39 99
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
40 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
41 9999
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
42 >>> d = DataIterator([DummyFile(1)], 10, 121)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
43 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
44 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
45 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
46 120
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
47 >>> d = DataIterator([DummyFile(1)], 10, 1)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
48 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
49 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
50 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
51 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
52 >>> d = DataIterator([DummyFile(1)], 2000)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
53 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
54 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
55 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
56 20000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
57 >>> d = DataIterator([DummyFile(1)], 2000, 31254)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
58 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
59 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
60 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
61 30000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
62 >>> d = DataIterator([DummyFile(1)], 2000, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
63 >>> d.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
64 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
65 >>> d.bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
66 2000
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
67 """
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
68 self.batchsize = batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
69 if bufsize is None:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
70 self.bufsize = max(10*batchsize, 10000)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
71 else:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
72 self.bufsize = bufsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
73 self.bufsize -= self.bufsize % self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
74 if self.bufsize < self.batchsize:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
75 self.bufsize = self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
76 self.files = iter(files)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
77 self.curfile = self.files.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
78 self.empty = False
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
79 self._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
80
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
81 def _fill_buf(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
82 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
83 Fill the internal buffer.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
84
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
85 Will fill across files in case the current one runs out.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
86
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
87 Test:
302
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
88 >>> d = DataIterator([DummyFile(20, (3,2))], 10, 10)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
89 >>> d._fill_buf()
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
90 >>> d.curpos
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
91 0
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
92 >>> len(d.buffer)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
93 10
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
94 >>> d = DataIterator([DummyFile(11, (3,2)), DummyFile(9, (3,2))], 10, 10)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
95 >>> d._fill_buf()
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
96 >>> len(d.buffer)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
97 10
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
98 >>> d._fill_buf()
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
99 Traceback (most recent call last):
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
100 ...
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
101 StopIteration
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
102 >>> d = DataIterator([DummyFile(10, (3,2)), DummyFile(9, (3,2))], 10, 10)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
103 >>> d._fill_buf()
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
104 >>> len(d.buffer)
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
105 9
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
106 >>> d._fill_buf()
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
107 Traceback (most recent call last):
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
108 ...
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
109 StopIteration
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
110 >>> d = DataIterator([DummyFile(20)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
111 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
112 >>> d.curpos
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
113 0
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
114 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
115 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
116 >>> d = DataIterator([DummyFile(11), DummyFile(9)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
117 >>> d._fill_buf()
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
118 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
119 10
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
120 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
121 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
122 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
123 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
124 >>> d = DataIterator([DummyFile(10), DummyFile(9)], 10, 10)
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
125 >>> d._fill_buf()
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
126 >>> len(d.buffer)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
127 9
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
128 >>> d._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
129 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
130 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
131 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
132 """
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
133 self.buffer = None
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
134 if self.empty:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
135 raise StopIteration
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
136 buf = self.curfile.read(self.bufsize)
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
137
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
138 while len(buf) < self.bufsize:
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
139 try:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
140 self.curfile = self.files.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
141 except StopIteration:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
142 self.empty = True
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
143 if len(buf) == 0:
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
144 raise
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
145 break
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
146 tmpbuf = self.curfile.read(self.bufsize - len(buf))
302
1adfafdc3d57 Fix concatenation of 1-dim datasets (such as int target vectors).
Arnaud Bergeron <abergeron@gmail.com>
parents: 189
diff changeset
147 buf = numpy.concatenate([buf, tmpbuf], axis=0)
178
938bd350dbf0 Make the datasets iterators return theano shared slices with the appropriate types.
Arnaud Bergeron <abergeron@gmail.com>
parents: 163
diff changeset
148
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
149 self.cursize = len(buf)
180
76bc047df5ee Add dtype conversion and rescaling to the read path.
Arnaud Bergeron <abergeron@gmail.com>
parents: 179
diff changeset
150 self.buffer = buf
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
151 self.curpos = 0
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
152
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
153 def __next__(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
154 r"""
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
155 Returns the next portion of the dataset.
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
156
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
157 Test:
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
158 >>> d = DataIterator([DummyFile(20)], 10, 20)
179
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
159 >>> len(d.next())
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
160 10
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
161 >>> len(d.next())
defd388aba0c Do not yield theano shared variables. They can only be used by theano.function().
Arnaud Bergeron <abergeron@gmail.com>
parents: 178
diff changeset
162 10
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
163 >>> d.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
164 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
165 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
166 StopIteration
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
167 >>> d.next()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
168 Traceback (most recent call last):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
169 ...
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
170 StopIteration
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
171 >>> d = DataIterator([DummyFile(13)], 10, 50)
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
172 >>> len(d.next())
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
173 10
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
174 >>> len(d.next())
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
175 3
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
176 >>> d.next()
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
177 Traceback (most recent call last):
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
178 ...
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
179 StopIteration
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
180 """
189
0d0677773533 Fix bug where there would be a bunch of 0-length batches at the end under certain circumstances.
Arnaud Bergeron <abergeron@gmail.com>
parents: 180
diff changeset
181 if self.curpos >= self.cursize:
163
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
182 self._fill_buf()
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
183 res = self.buffer[self.curpos:self.curpos+self.batchsize]
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
184 self.curpos += self.batchsize
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
185 return res
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
186
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
187 next = __next__
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
188
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
189 def __iter__(self):
4b28d7382dbf Add inital implementation of datasets.
Arnaud Bergeron <abergeron@gmail.com>
parents:
diff changeset
190 return self