Mercurial > pylearn
annotate dataset.py @ 20:266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
author | bengioy@bengiomac.local |
---|---|
date | Mon, 07 Apr 2008 09:48:39 -0400 |
parents | 57f4015e2e09 |
children | b6b36f65664f |
rev | line source |
---|---|
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
1 |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
2 from lookup_list import LookupList |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
3 Example = LookupList |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
4 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
5 class AbstractFunction (Exception): """Derived class must override this function""" |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
6 |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
7 class DataSet(object): |
16 | 8 """A virtual base class for datasets. |
9 | |
10 A DataSet is a generator of iterators; these iterators can run through the | |
11 examples in a variety of ways. A DataSet need not necessarily have a finite | |
12 or known length, so this class can be used to interface to a 'stream' which | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
13 feeds on-line learning. |
16 | 14 |
15 To iterate over examples, there are several possibilities: | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
16 - for example in dataset.zip([field1, field2,field3, ...]) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
17 - for val1,val2,val3 in dataset.zip([field1, field2,field3]) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
18 - for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
19 - for example in dataset |
16 | 20 Each of these is documented below. |
21 | |
22 Note: For a dataset of fixed and known length, which can implement item | |
23 random-access efficiently (e.g. indexing and slicing), and which can profit | |
24 from the FiniteDataSetIterator, consider using base class FiniteDataSet. | |
25 | |
26 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. | |
27 | |
28 Note: The content of a field can be of any type. | |
29 | |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
30 """ |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
31 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
32 def __init__(self): |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
33 pass |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
34 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
35 def __iter__(self): |
16 | 36 """Supports the syntax "for i in dataset: ..." |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
37 |
16 | 38 Using this syntax, "i" will be an Example instance (or equivalent) with |
39 all the fields of DataSet self. Every field of "i" will give access to | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
40 a field of a single example. Fields should be accessible via |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
41 i["fielname"] or i[3] (in the fieldNames() order), but the derived class is free |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
42 to accept any type of identifier, and add extra functionality to the iterator. |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
43 """ |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
44 return self.zip(*self.fieldNames()) |
16 | 45 |
46 def zip(self, *fieldnames): | |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
47 """ |
16 | 48 Supports two forms of syntax: |
49 | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
50 for i in dataset.zip([f1, f2, f3]): ... |
16 | 51 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
52 for i1, i2, i3 in dataset.zip([f1, f2, f3]): ... |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
53 |
16 | 54 Using the first syntax, "i" will be an indexable object, such as a list, |
55 tuple, or Example instance, such that on every iteration, i[0] is the f1 | |
56 field of the current example, i[1] is the f2 field, and so on. | |
57 | |
58 Using the second syntax, i1, i2, i3 will contain the the contents of the | |
59 f1, f2, and f3 fields of a single example on each loop iteration. | |
60 | |
61 The derived class may accept fieldname arguments of any type. | |
62 | |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
63 """ |
19 | 64 class Iter(LookupList): |
65 def __init__(self, ll): | |
66 LookupList.__init__(self, ll.keys(), ll.values()) | |
67 self.ll = ll | |
68 def __iter__(self): #makes for loop work | |
69 return self | |
70 def next(self): | |
71 self.ll.next() | |
72 self._values = [v[0] for v in self.ll._values] | |
73 return self | |
74 return Iter(self.minibatches(fieldnames, minibatch_size = 1)) | |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
75 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
76 minibatches_fieldnames = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
77 minibatches_minibatch_size = 1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
78 minibatches_n_batches = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
79 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
80 fieldnames = minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
81 minibatch_size = minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
82 n_batches = minibatches_n_batches): |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
83 """ |
16 | 84 Supports two forms of syntax: |
85 | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
86 for i in dataset.minibatches([f1, f2, f3],**kwargs): ... |
16 | 87 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
88 for i1, i2, i3 in dataset.minibatches([f1, f2, f3],**kwargs): ... |
16 | 89 |
90 Using the first syntax, "i" will be an indexable object, such as a list, | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
91 tuple, or Example instance, such that on every iteration, i[0] is a |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
92 list-like container of the f1 field of a batch current examples, i[1] is |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
93 a list-like container of the f2 field, etc. |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
94 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
95 Using the second syntax, i1, i2, i3 will be list-like containers of the |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
96 f1, f2, and f3 fields of a batch of examples on each loop iteration. |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
97 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
98 PARAMETERS |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
99 - fieldnames (list of any type, default None): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
100 The loop variables i1, i2, i3 (in the example above) should contain the |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
101 f1, f2, and f3 fields of the current batch of examples. If None, the |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
102 derived class can choose a default, e.g. all fields. |
16 | 103 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
104 - minibatch_size (integer, default 1) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
105 On every iteration, the variables i1, i2, i3 will have |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
106 exactly minibatch_size elements. e.g. len(i1) == minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
107 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
108 - n_batches (integer, default None) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
109 The iterator will loop exactly this many times, and then stop. If None, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
110 the derived class can choose a default. If (-1), then the returned |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
111 iterator should support looping indefinitely. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
112 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
113 Note: A list-like container is something like a tuple, list, numpy.ndarray or |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
114 any other object that supports integer indexing and slicing. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
115 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
116 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
117 raise AbstractFunction() |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
118 |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
119 def fieldNames(self): |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
120 #Yoshua- |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
121 # This list may not be finite; what would make sense in the use you have |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
122 # in mind? |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
123 # -JB |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
124 #James- |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
125 # You are right. I had put this to be able to iterate over the fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
126 # but maybe an iterator mechanism (over fields rather than examples) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
127 # would be more appropriate. Fieldnames are needed in general |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
128 # by the iterators over examples or minibatches, to construct |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
129 # examples or minibatches with the corresponding names as attributes. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
130 # -YB |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
131 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
132 Return an iterator (an object with an __iter__ method) that |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
133 iterates over the names of the fields. As a special cases, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
134 a list or a tuple of field names can be returned. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
135 """" |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
136 # Note that some datasets |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
137 # may have virtual fields and support a virtually infinite number |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
138 # of possible field names. In that case, fieldNames() should |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
139 # either raise an error or iterate over a particular set of |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
140 # names as appropriate. Another option would be to iterate |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
141 # over the sub-datasets comprising a single field at a time. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
142 # I am not sure yet what is most appropriate. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
143 # -YB |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
144 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
145 raise AbstractFunction() |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
146 |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
147 def rename(*new_field_specifications): |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
148 #Yoshua- |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
149 # Do you mean for this to be a virtual method? |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
150 # Wouldn't this functionality be easier to provide via a |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
151 # RenamingDataSet, such as the one I've written below? |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
152 # -JB |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
153 # You are right. Whichever implementation, however, we need a generic way to |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
154 # 'concatenate' fields, to handle the ([old_field1, old_field2, ...], new_field) semantics. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
155 # -YB |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
156 """ |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
157 Return a new dataset that maps old fields (of self) to new fields (of the returned |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
158 dataset). The minimal syntax that should be supported is the following: |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
159 new_field_specifications = [new_field_spec1, new_field_spec2, ...] |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
160 new_field_spec = ([old_field1, old_field2, ...], new_field) |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
161 In general both old_field and new_field should be strings, but some datasets may also |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
162 support additional indexing schemes within each field (e.g. column slice |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
163 of a matrix-like field). |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
164 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
165 raise AbstractFunction() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
166 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
167 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
168 def apply_function(function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
169 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
170 Return a dataset that contains as fields the results of applying |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
171 the given function (example-wise) to the specified input_fields. The |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
172 function should return a sequence whose elements will be stored in |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
173 fields whose names are given in the output_fields list. If copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
174 is True then the resulting dataset will also contain the fields of self. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
175 If accept_minibatches, then the function may be called |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
176 with minibatches as arguments (what is returned by the minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
177 iterator). In any case, the computations may be delayed until the examples |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
178 of the resulting dataset are requested. If cache is True, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
179 once the output fields for some examples have been computed, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
180 are cached (to avoid recomputation if the same examples are again |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
181 requested). |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
182 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
183 return ApplyFunctionDataSet(function, input_fields, output_fields, copy_inputs, accept_minibatches, cache) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
184 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
185 class RenamingDataSet(DataSet): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
186 """A DataSet that wraps another one, and makes it look like the field names |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
187 are different |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
188 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
189 Renaming is done by a dictionary that maps new names to the old ones used in |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
190 self.src. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
191 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
192 def __init__(self, src, rename_dct): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
193 DataSet.__init__(self) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
194 self.src = src |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
195 self.rename_dct = copy.copy(rename_dct) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
196 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
197 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
198 fieldnames = DataSet.minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
199 minibatch_size = DataSet.minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
200 n_batches = DataSet.minibatches_n_batches): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
201 dct = self.rename_dct |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
202 new_fieldnames = [dct.get(f, f) for f in fieldnames] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
203 return self.src.minibatches(new_fieldnames, minibatches_size, n_batches) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
204 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
205 def fieldNames(self): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
206 return [dct.get(f, f) for f in self.src.fieldNames()] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
207 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
208 |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
209 class FiniteDataSet(DataSet): |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
210 """ |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
211 Virtual interface, a subclass of DataSet for datasets which have a finite, known length. |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
212 Examples are indexed by an integer between 0 and self.length()-1, |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
213 and a subdataset can be obtained by slicing. This may not be appropriate in general |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
214 but only for datasets which can be thought of like ones that access rows AND fields |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
215 in an efficient random access way. Users are encouraged to expect only the generic dataset |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
216 interface in general. A FiniteDataSet is mainly useful when one has to obtain |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
217 a subset of examples (e.g. for splitting a dataset into training and test sets). |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
218 """ |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
219 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
220 class FiniteDataSetIterator(object): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
221 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
222 If the fieldnames list is empty, it means that we want to see ALL the fields. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
223 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
224 def __init__(self,dataset,minibatch_size=1,fieldnames=[]): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
225 self.dataset=dataset |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
226 self.minibatch_size=minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
227 assert minibatch_size>=1 and minibatch_size<=len(dataset) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
228 self.current = -self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
229 self.fieldnames = fieldnames |
19 | 230 if len(dataset) % minibatch_size: |
231 raise NotImplementedError() | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
232 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
233 def __iter__(self): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
234 return self |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
235 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
236 def next(self): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
237 self.current+=self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
238 if self.current>=len(self.dataset): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
239 self.current=-self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
240 raise StopIteration |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
241 if self.minibatch_size==1: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
242 complete_example=self.dataset[self.current] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
243 else: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
244 complete_example=self.dataset[self.current:self.current+self.minibatch_size] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
245 if self.fieldnames: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
246 return Example(self.fieldnames,list(complete_example)) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
247 else: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
248 return complete_example |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
249 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
250 def __init__(self): |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
251 pass |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
252 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
253 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
254 fieldnames = DataSet.minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
255 minibatch_size = DataSet.minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
256 n_batches = DataSet.minibatches_n_batches): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
257 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
258 If the fieldnames list is empty, it means that we want to see ALL the fields. |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
259 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
260 If the n_batches is empty, we want to see all the examples possible |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
261 for the give minibatch_size. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
262 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
263 # substitute the defaults: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
264 if fieldnames is None: fieldnames = self.fieldNames() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
265 if n_batches is None: n_batches = len(self) / minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
266 return DataSet.Iterator(self, fieldnames, minibatch_size, n_batches) |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
267 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
268 def __getattr__(self,fieldname): |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
269 """Return an that can iterate over the values of the field in this dataset.""" |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
270 return self(fieldname) |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
271 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
272 def __call__(self,*fieldnames): |
16 | 273 """Return a sub-dataset containing only the given fieldnames as fields. |
274 | |
275 The return value's default iterator will iterate only over the given | |
276 fields. | |
277 """ | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
278 raise AbstractFunction() |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
279 |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
280 def __len__(self): |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
281 """len(dataset) returns the number of examples in the dataset.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
282 raise AbstractFunction() |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
283 |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
284 def __getitem__(self,i): |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
285 """dataset[i] returns the (i+1)-th example of the dataset.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
286 raise AbstractFunction() |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
287 |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
288 def __getslice__(self,*slice_args): |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
289 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
290 raise AbstractFunction() |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
291 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
292 # we may want ArrayDataSet defined in another python file |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
293 |
4 | 294 import numpy |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
295 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
296 def as_array_dataset(dataset): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
297 # Generally datasets can be efficient by making data fields overlap, but |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
298 # this function doesn't know which fields overlap. So, it should check if |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
299 # dataset supports an as_array_dataset member function, and return that if |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
300 # possible. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
301 if hasattr(dataset, 'as_array_dataset'): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
302 return dataset.as_array_dataset() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
303 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
304 raise NotImplementedError() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
305 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
306 # Make ONE big minibatch with all the examples, to separate the fields. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
307 n_examples = len(dataset) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
308 batch = dataset.minibatches( minibatch_size = len(dataset)).next() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
309 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
310 # Each field of the underlying dataset must be convertible to a numpy array of the same type |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
311 # currently just double, but should use the smallest compatible dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
312 n_fields = len(batch) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
313 fieldnames = batch.fields.keys() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
314 total_width = 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
315 type = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
316 fields = LookupList() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
317 for i in xrange(n_fields): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
318 field = array(batch[i]) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
319 assert field.shape[0]==n_examples |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
320 width = field.shape[1] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
321 start=total_width |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
322 total_width += width |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
323 fields[fieldnames[i]]=slice(start,total_width,1) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
324 # many complicated things remain to be done: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
325 # - find common dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
326 # - decide what to do with extra dimensions if not the same in all fields |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
327 # - try to see if we can avoid the copy? |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
328 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
329 class ArrayDataSet(FiniteDataSet): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
330 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
331 An ArrayDataSet behaves like a numpy array but adds the notion of named fields |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
332 from DataSet (and the ability to view the values of multiple fields as an 'Example'). |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
333 It is a fixed-length and fixed-width dataset |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
334 in which each element is a fixed dimension numpy array or a number, hence the whole |
9
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
335 dataset corresponds to a numpy array. Fields |
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
336 must correspond to a slice of array columns. If the dataset has fields, |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
337 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array. |
9
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
338 Any dataset can also be converted to a numpy array (losing the notion of fields |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
339 by the numpy.array(dataset) call. |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
340 """ |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
341 |
19 | 342 class Iterator(LookupList): |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
343 """An iterator over a finite dataset that implements wrap-around""" |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
344 def __init__(self, dataset, fieldnames, minibatch_size, next_max): |
19 | 345 LookupList.__init__(self, fieldnames, [0] * len(fieldnames)) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
346 self.dataset=dataset |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
347 self.minibatch_size=minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
348 self.next_count = 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
349 self.next_max = next_max |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
350 self.current = -self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
351 assert minibatch_size > 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
352 if minibatch_size >= len(dataset): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
353 raise NotImplementedError() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
354 |
19 | 355 def __iter__(self): #makes for loop work |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
356 return self |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
357 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
358 @staticmethod |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
359 def matcat(a, b): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
360 a0, a1 = a.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
361 b0, b1 = b.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
362 assert a1 == b1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
363 assert a.dtype is b.dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
364 rval = numpy.empty( (a0 + b0, a1), dtype=a.dtype) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
365 rval[:a0,:] = a |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
366 rval[a0:,:] = b |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
367 return rval |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
368 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
369 def next(self): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
370 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
371 #check for end-of-loop |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
372 self.next_count += 1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
373 if self.next_count == self.next_max: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
374 raise StopIteration |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
375 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
376 #determine the first and last elements of the slice we'll return |
19 | 377 rows = self.dataset.data.shape[0] |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
378 self.current += self.minibatch_size |
19 | 379 if self.current >= rows: |
380 self.current -= rows | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
381 upper = self.current + self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
382 |
19 | 383 data = self.dataset.data |
384 | |
385 if upper <= rows: | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
386 #this is the easy case, we only need once slice |
19 | 387 dataview = data[self.current:upper] |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
388 else: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
389 # the minibatch wraps around the end of the dataset |
19 | 390 dataview = data[self.current:] |
391 upper -= rows | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
392 assert upper > 0 |
19 | 393 dataview = self.matcat(dataview, data[:upper]) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
394 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
395 |
19 | 396 self._values = [dataview[:, self.dataset.fields[f]]\ |
397 for f in self._names] | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
398 |
19 | 399 return self |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
400 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
401 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
402 def __init__(self, data, fields=None): |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
403 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
404 There are two ways to construct an ArrayDataSet: (1) from an |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
405 existing dataset (which may result in a copy of the data in a numpy array), |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
406 or (2) from a numpy.array (the data argument), along with an optional description |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
407 of the fields (a LookupList of column slices indexed by field names). |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
408 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
409 self.data=data |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
410 self.fields=fields |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
411 rows, cols = data.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
412 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
413 if fields: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
414 for fieldname,fieldslice in fields.items(): |
4 | 415 # make sure fieldslice.start and fieldslice.step are defined |
416 start=fieldslice.start | |
417 step=fieldslice.step | |
418 if not start: | |
419 start=0 | |
420 if not step: | |
421 step=1 | |
422 if not fieldslice.start or not fieldslice.step: | |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
423 fields[fieldname] = fieldslice = slice(start,fieldslice.stop,step) |
4 | 424 # and coherent with the data array |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
425 assert fieldslice.start >= 0 and fieldslice.stop <= cols |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
426 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
427 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
428 fieldnames = DataSet.minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
429 minibatch_size = DataSet.minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
430 n_batches = DataSet.minibatches_n_batches): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
431 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
432 If the fieldnames list is empty, it means that we want to see ALL the fields. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
433 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
434 If the n_batches is empty, we want to see all the examples possible |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
435 for the give minibatch_size. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
436 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
437 # substitute the defaults: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
438 if fieldnames is None: fieldnames = self.fieldNames() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
439 if n_batches is None: n_batches = len(self) / minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
440 return ArrayDataSet.Iterator(self, fieldnames, minibatch_size, n_batches) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
441 |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
442 def __getattr__(self,fieldname): |
4 | 443 """ |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
444 Return a numpy array with the content associated with the given field name. |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
445 If this is a one-example dataset, then a row, i.e., numpy array (of one less dimension |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
446 than the dataset itself) is returned. |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
447 """ |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
448 if len(self.data)==1: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
449 return self.data[0,self.fields[fieldname]] |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
450 return self.data[:,self.fields[fieldname]] |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
451 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
452 def __call__(self,*fieldnames): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
453 """Return a sub-dataset containing only the given fieldnames as fields.""" |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
454 min_col=self.data.shape[1] |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
455 max_col=0 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
456 for field_slice in self.fields.values(): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
457 min_col=min(min_col,field_slice.start) |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
458 max_col=max(max_col,field_slice.stop) |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
459 new_fields=LookupList() |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
460 for fieldname,fieldslice in self.fields.items(): |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
461 new_fields[fieldname]=slice(fieldslice.start-min_col,fieldslice.stop-min_col,fieldslice.step) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
462 return ArrayDataSet(self.data[:,min_col:max_col],fields=new_fields) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
463 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
464 def fieldNames(self): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
465 """Return the list of field names that are supported by getattr and getFields.""" |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
466 return self.fields.keys() |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
467 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
468 def __len__(self): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
469 """len(dataset) returns the number of examples in the dataset.""" |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
470 return len(self.data) |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
471 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
472 def __getitem__(self,i): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
473 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
474 dataset[i] returns the (i+1)-th Example of the dataset. If there are no fields |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
475 the result is just a numpy array (for the i-th row of the dataset data matrix). |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
476 """ |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
477 if self.fields: |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
478 fieldnames,fieldslices=zip(*self.fields.items()) |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
479 return Example(self.fields.keys(),[self.data[i,fieldslice] for fieldslice in self.fields.values()]) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
480 else: |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
481 return self.data[i] |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
482 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
483 def __getslice__(self,*args): |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
484 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
485 return ArrayDataSet(self.data.__getslice__(*args), fields=self.fields) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
486 |
8
d1c394486037
Replaced asarray() method by __array__ method which gets called automatically when
bengioy@bengiomac.local
parents:
7
diff
changeset
|
487 def __array__(self): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
488 """Return a view of this dataset which is an numpy.ndarray (i.e. losing |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
489 the identity and name of fields within the dataset). |
15 | 490 |
491 Numpy uses this special function name to retrieve an ndarray view for | |
492 function such as numpy.sum, numpy.dot, numpy.asarray, etc. | |
493 | |
494 If this dataset has no fields, then we simply return self.data, | |
495 otherwise things are complicated. | |
496 - why do we want this behaviour when there are fields? (JB) | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
497 - for convenience and completeness (but maybe it would make |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
498 more sense to implement this through a 'field-merging' |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
499 dataset). (YB) |
15 | 500 """ |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
501 if not self.fields: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
502 return self.data |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
503 # else, select subsets of columns mapped by the fields |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
504 columns_used = numpy.zeros((self.data.shape[1]),dtype=bool) |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
505 for field_slice in self.fields.values(): |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
506 for c in xrange(field_slice.start,field_slice.stop,field_slice.step): |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
507 columns_used[c]=True |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
508 # try to figure out if we can map all the slices into one slice: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
509 mappable_to_one_slice = True |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
510 start=0 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
511 while start<len(columns_used) and not columns_used[start]: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
512 start+=1 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
513 stop=len(columns_used) |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
514 while stop>0 and not columns_used[stop-1]: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
515 stop-=1 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
516 step=0 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
517 i=start |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
518 while i<stop: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
519 j=i+1 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
520 while j<stop and not columns_used[j]: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
521 j+=1 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
522 if step: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
523 if step!=j-i: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
524 mappable_to_one_slice = False |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
525 break |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
526 else: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
527 step = j-i |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
528 i=j |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
529 if mappable_to_one_slice: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
530 return self.data[:,slice(start,stop,step)] |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
531 # else make contiguous copy |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
532 n_columns = sum(columns_used) |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
533 result = zeros((len(self.data),n_columns)+self.data.shape[2:],self.data.dtype) |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
534 print result.shape |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
535 c=0 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
536 for field_slice in self.fields.values(): |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
537 slice_width=field_slice.stop-field_slice.start/field_slice.step |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
538 # copy the field here |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
539 result[:,slice(c,slice_width)]=self.data[:,field_slice] |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
540 c+=slice_width |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
541 return result |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
542 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
543 class ApplyFunctionDataset(DataSet): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
544 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
545 A dataset that contains as fields the results of applying |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
546 a given function (example-wise) to specified input_fields of a source |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
547 dataset. The function should return a sequence whose elements will be stored in |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
548 fields whose names are given in the output_fields list. If copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
549 is True then the resulting dataset will also contain the fields of the source. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
550 dataset. If accept_minibatches, then the function expects |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
551 minibatches as arguments (what is returned by the minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
552 iterator). In any case, the computations may be delayed until the examples |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
553 of self are requested. If cache is True, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
554 once the output fields for some examples have been computed, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
555 are cached (to avoid recomputation if the same examples are again requested). |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
556 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
557 def __init__(src,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
558 DataSet.__init__(self) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
559 self.src=src |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
560 self.function=function |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
561 self.input_fields=input_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
562 self.output_fields=output_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
563 self.copy_inputs=copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
564 self.accept_minibatches=accept_minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
565 src_fieldnames = src.fieldNames() |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
566 if copy_inputs: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
567 for src_field in src_fieldnames: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
568 assert src_field not in output_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
569 self.fieldnames=src_fieldnames+output_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
570 else: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
571 self.fieldnames=output_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
572 for input_field in input_fields: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
573 assert input_field in src_fieldnames |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
574 self.cache=cache |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
575 if cache: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
576 # maybe a fixed-size array kind of structure would be more efficient than a list |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
577 # in the case where src is FiniteDataSet. -YB |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
578 self.cached_examples = [] |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
579 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
580 def fieldNames(self): return self.fieldnames |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
581 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
582 def minibatches(self, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
583 fieldnames = DataSet.minibatches_fieldnames, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
584 minibatch_size = DataSet.minibatches_minibatch_size, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
585 n_batches = DataSet.minibatches_n_batches): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
586 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
587 class Iterator(LookupList): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
588 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
589 def __init__(self,dataset): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
590 LookupList.__init__(self, fieldnames, [0]*len(fieldnames)) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
591 self.dataset=dataset |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
592 if dataset.copy_inputs: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
593 src_fields=dataset.fieldNames() |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
594 else: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
595 src_fields=dataset.input_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
596 self.src_iterator=self.src.minibatches(src_fields,minibatch_size,n_batches) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
597 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
598 def __iter__(self): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
599 return self |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
600 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
601 def next(self): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
602 src_examples = self.src_iterator.next() |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
603 if self.dataset.copy_inputs: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
604 function_inputs = src_examples |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
605 else: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
606 function_inputs = |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
607 [src_examples[field_name] for field_name in self.dataset.input_fields]) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
608 return self.dataset.function(*function_inputs) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
609 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
610 for fieldname in fieldnames: |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
611 assert fieldname in self.input_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
612 return Iterator(self) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
613 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
614 |