annotate dataset.py @ 26:672fe4b23032

Fixed dataset errors so that _test_dataset.py works again.
author bengioy@grenat.iro.umontreal.ca
date Fri, 11 Apr 2008 11:14:54 -0400
parents 526e192b0699
children 541a273bc89f
rev   line source
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
1
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
2 from lookup_list import LookupList
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
3 Example = LookupList
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
4 import copy
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
5
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
6 class AbstractFunction (Exception): """Derived class must override this function"""
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
7
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
8 class DataSet(object):
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
9 """A virtual base class for datasets.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
10
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
11 A DataSet is a generator of iterators; these iterators can run through the
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
12 examples in a variety of ways. A DataSet need not necessarily have a finite
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
13 or known length, so this class can be used to interface to a 'stream' which
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
14 feeds on-line learning.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
15
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
16 To iterate over examples, there are several possibilities:
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
17 - for example in dataset.zip([field1, field2,field3, ...])
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
18 - for val1,val2,val3 in dataset.zip([field1, field2,field3])
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
19 - for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
20 - for example in dataset
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
21 Each of these is documented below. All of these iterators are expected
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
22 to provide, in addition to the usual 'next()' method, a 'next_index()' method
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
23 which returns a non-negative integer pointing to the position of the next
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
24 example that will be returned by 'next()' (or of the first example in the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
25 next minibatch returned). This is important because these iterators
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
26 can wrap around the dataset in order to do multiple passes through it,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
27 in possibly unregular ways if the minibatch size is not a divisor of the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
28 dataset length.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
29
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
30 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
31
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
32 Note: The content of a field can be of any type.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
33
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
34 Note: A dataset can recognize a potentially infinite number of field names (i.e. the field
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
35 values can be computed on-demand, when particular field names are used in one of the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
36 iterators).
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
37
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
38 Datasets of finite length should be sub-classes of FiniteLengthDataSet.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
39
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
40 Datasets whose elements can be indexed and sub-datasets of consecutive
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
41 examples (i.e. slices) can be extracted from should be sub-classes of
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
42 SliceableDataSet.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
43
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
44 Datasets with a finite number of fields should be sub-classes of
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
45 FiniteWidthDataSet.
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
46 """
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
47
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
48 def __init__(self):
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
49 pass
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
50
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
51 class Iterator(LookupList):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
52 def __init__(self, ll):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
53 LookupList.__init__(self, ll.keys(), ll.values())
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
54 self.ll = ll
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
55 def __iter__(self): #makes for loop work
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
56 return self
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
57 def next(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
58 self.ll.next()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
59 self._values = [v[0] for v in self.ll._values]
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
60 return self
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
61 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
62 return self.ll.next_index()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
63
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
64 def __iter__(self):
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
65 """Supports the syntax "for i in dataset: ..."
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
66
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
67 Using this syntax, "i" will be an Example instance (or equivalent) with
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
68 all the fields of DataSet self. Every field of "i" will give access to
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
69 a field of a single example. Fields should be accessible via
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
70 i["fielname"] or i[3] (in the order defined by the elements of the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
71 Example returned by this iterator), but the derived class is free
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
72 to accept any type of identifier, and add extra functionality to the iterator.
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
73 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
74 return DataSet.Iterator(self.minibatches(None, minibatch_size = 1))
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
75
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
76 def zip(self, *fieldnames):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
77 """
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
78 Supports two forms of syntax:
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
79
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
80 for i in dataset.zip([f1, f2, f3]): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
81
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
82 for i1, i2, i3 in dataset.zip([f1, f2, f3]): ...
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
83
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
84 Using the first syntax, "i" will be an indexable object, such as a list,
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
85 tuple, or Example instance, such that on every iteration, i[0] is the f1
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
86 field of the current example, i[1] is the f2 field, and so on.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
87
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
88 Using the second syntax, i1, i2, i3 will contain the the contents of the
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
89 f1, f2, and f3 fields of a single example on each loop iteration.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
90
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
91 The derived class may accept fieldname arguments of any type.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
92
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
93 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
94 return DataSet.Iterator(self.minibatches(fieldnames, minibatch_size = 1))
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
95
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
96 minibatches_fieldnames = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
97 minibatches_minibatch_size = 1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
98 minibatches_n_batches = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
99 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
100 fieldnames = minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
101 minibatch_size = minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
102 n_batches = minibatches_n_batches):
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
103 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
104 Supports three forms of syntax:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
105
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
106 for i in dataset.minibatches(None,**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
107
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
108 for i in dataset.minibatches([f1, f2, f3],**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
109
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
110 for i1, i2, i3 in dataset.minibatches([f1, f2, f3],**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
111
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
112 Using the first two syntaxes, "i" will be an indexable object, such as a list,
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
113 tuple, or Example instance. In both cases, i[k] is a list-like container
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
114 of a batch of current examples. In the second case, i[0] is
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
115 list-like container of the f1 field of a batch current examples, i[1] is
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
116 a list-like container of the f2 field, etc.
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
117
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
118 Using the first syntax, all the fields will be returned in "i".
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
119 Beware that some datasets may not support this syntax, if the number
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
120 of fields is infinite (i.e. field values may be computed "on demand").
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
121
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
122 Using the third syntax, i1, i2, i3 will be list-like containers of the
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
123 f1, f2, and f3 fields of a batch of examples on each loop iteration.
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
124
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
125 PARAMETERS
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
126 - fieldnames (list of any type, default None):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
127 The loop variables i1, i2, i3 (in the example above) should contain the
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
128 f1, f2, and f3 fields of the current batch of examples. If None, the
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
129 derived class can choose a default, e.g. all fields.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
130
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
131 - minibatch_size (integer, default 1)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
132 On every iteration, the variables i1, i2, i3 will have
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
133 exactly minibatch_size elements. e.g. len(i1) == minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
134
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
135 - n_batches (integer, default None)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
136 The iterator will loop exactly this many times, and then stop. If None,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
137 the derived class can choose a default. If (-1), then the returned
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
138 iterator should support looping indefinitely.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
139
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
140 Note: A list-like container is something like a tuple, list, numpy.ndarray or
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
141 any other object that supports integer indexing and slicing.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
142
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
143 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
144 raise AbstractFunction()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
145
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
146 def hasFields(self,*fieldnames):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
147 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
148 Return true if the given field name (or field names, if multiple arguments are
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
149 given) is recognized by the DataSet (i.e. can be used as a field name in one
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
150 of the iterators).
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
151 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
152 raise AbstractFunction()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
153
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
154 def merge_fields(self,*specifications):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
155 """
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
156 Return a new dataset that maps old fields (of self) to new fields (of the returned
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
157 dataset). The minimal syntax that should be supported is the following:
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
158 new_field_specifications = [new_field_spec1, new_field_spec2, ...]
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
159 new_field_spec = ([old_field1, old_field2, ...], new_field)
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
160 In general both old_field and new_field should be strings, but some datasets may also
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
161 support additional indexing schemes within each field (e.g. column slice
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
162 of a matrix-like field).
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
163 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
164 raise AbstractFunction()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
165
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
166 def merge_field_values(self,*field_value_pairs):
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
167 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
168 Return the value that corresponds to merging the values of several fields,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
169 given as arguments (field_name, field_value) pairs with self.hasField(field_name).
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
170 This may be used by implementations of merge_fields.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
171 Raise a ValueError if the operation is not possible.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
172 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
173 fieldnames,fieldvalues = zip(*field_value_pairs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
174 raise ValueError("Unable to merge values of these fields:"+repr(fieldnames))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
175
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
176 def examples2minibatch(self,examples):
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
177 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
178 Combine a list of Examples into a minibatch. A minibatch is an Example whose fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
179 are iterable over the examples of the minibatch.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
180 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
181 raise AbstractFunction()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
182
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
183 def rename(self,rename_dict):
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
184 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
185 Return a new dataset that renames fields, using a dictionnary that maps old field
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
186 names to new field names. The only fields visible by the returned dataset are those
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
187 whose names are keys of the rename_dict.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
188 """
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
189 self_class = self.__class__
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
190 class SelfRenamingDataSet(RenamingDataSet,self_class):
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
191 pass
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
192 self.__class__ = SelfRenamingDataSet
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
193 # set the rename_dict and src fields
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
194 SelfRenamingDataSet.__init__(self,self,rename_dict)
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
195 return self
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
196
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
197 def applyFunction(self,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
198 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
199 Return a dataset that contains as fields the results of applying
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
200 the given function (example-wise) to the specified input_fields. The
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
201 function should return a sequence whose elements will be stored in
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
202 fields whose names are given in the output_fields list. If copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
203 is True then the resulting dataset will also contain the fields of self.
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
204 If accept_minibatches, then the function may be called
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
205 with minibatches as arguments (what is returned by the minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
206 iterator). In any case, the computations may be delayed until the examples
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
207 of the resulting dataset are requested. If cache is True, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
208 once the output fields for some examples have been computed, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
209 are cached (to avoid recomputation if the same examples are again
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
210 requested).
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
211 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
212 return ApplyFunctionDataSet(function, input_fields, output_fields, copy_inputs, accept_minibatches, cache)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
213
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
214
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
215 class FiniteLengthDataSet(DataSet):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
216 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
217 Virtual interface for datasets that have a finite length (number of examples),
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
218 and thus recognize a len(dataset) call.
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
219 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
220 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
221 DataSet.__init__(self)
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
222
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
223 def __len__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
224 """len(dataset) returns the number of examples in the dataset."""
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
225 raise AbstractFunction()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
226
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
227
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
228 class SliceableDataSet(DataSet):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
229 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
230 Virtual interface, a subclass of DataSet for datasets which are sliceable
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
231 and whose individual elements can be accessed, generally respecting the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
232 python semantics for [spec], where spec is either a non-negative integer
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
233 (for selecting one example), or a python slice (for selecting a sub-dataset
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
234 comprising the specified examples). This is useful for obtaining
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
235 sub-datasets, e.g. for splitting a dataset into training and test sets.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
236 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
237 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
238 DataSet.__init__(self)
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
239
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
240 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
241 fieldnames = DataSet.minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
242 minibatch_size = DataSet.minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
243 n_batches = DataSet.minibatches_n_batches):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
244 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
245 If the n_batches is empty, we want to see all the examples possible
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
246 for the given minibatch_size (possibly missing a few at the end of the dataset).
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
247 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
248 # substitute the defaults:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
249 if n_batches is None: n_batches = len(self) / minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
250 return DataSet.Iterator(self, fieldnames, minibatch_size, n_batches)
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
251
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
252 def __getitem__(self,i):
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
253 """dataset[i] returns the (i+1)-th example of the dataset."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
254 raise AbstractFunction()
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
255
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
256 def __getslice__(self,*slice_args):
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
257 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
258 raise AbstractFunction()
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
259
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
260
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
261 class FiniteWidthDataSet(DataSet):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
262 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
263 Virtual interface for datasets that have a finite width (number of fields),
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
264 and thus return a list of fieldNames.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
265 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
266 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
267 DataSet.__init__(self)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
268
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
269 def hasFields(self,*fields):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
270 has_fields=True
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
271 fieldnames = self.fieldNames()
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
272 for name in fields:
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
273 if name not in fieldnames:
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
274 has_fields=False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
275 return has_fields
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
276
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
277 def fieldNames(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
278 """Return the list of field names that are supported by the iterators,
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
279 and for which hasFields(fieldname) would return True."""
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
280 raise AbstractFunction()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
281
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
282
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
283 class RenamingDataSet(FiniteWidthDataSet):
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
284 """A DataSet that wraps another one, and makes it look like the field names
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
285 are different
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
286
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
287 Renaming is done by a dictionary that maps new names to the old ones used in
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
288 self.src.
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
289 """
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
290 def __init__(self, src, rename_dct):
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
291 DataSet.__init__(self)
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
292 self.src = src
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
293 self.rename_dct = copy.copy(rename_dct)
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
294
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
295 def fieldNames(self):
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
296 return self.rename_dct.keys()
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
297
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
298 def minibatches(self,
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
299 fieldnames = DataSet.minibatches_fieldnames,
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
300 minibatch_size = DataSet.minibatches_minibatch_size,
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
301 n_batches = DataSet.minibatches_n_batches):
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
302 dct = self.rename_dct
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
303 new_fieldnames = [dct.get(f, f) for f in fieldnames]
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
304 return self.src.minibatches(new_fieldnames, minibatches_size, n_batches)
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
305
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
306
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
307 # we may want ArrayDataSet defined in another python file
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
308
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
309 import numpy
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
310
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
311 def as_array_dataset(dataset):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
312 # Generally datasets can be efficient by making data fields overlap, but
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
313 # this function doesn't know which fields overlap. So, it should check if
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
314 # dataset supports an as_array_dataset member function, and return that if
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
315 # possible.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
316 if hasattr(dataset, 'as_array_dataset'):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
317 return dataset.as_array_dataset()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
318
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
319 raise NotImplementedError
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
320
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
321 # Make ONE big minibatch with all the examples, to separate the fields.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
322 n_examples = len(dataset)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
323 batch = dataset.minibatches( minibatch_size = len(dataset)).next()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
324
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
325 # Each field of the underlying dataset must be convertible to a numpy array of the same type
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
326 # currently just double, but should use the smallest compatible dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
327 n_fields = len(batch)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
328 fieldnames = batch.fields.keys()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
329 total_width = 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
330 type = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
331 fields = LookupList()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
332 for i in xrange(n_fields):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
333 field = array(batch[i])
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
334 assert field.shape[0]==n_examples
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
335 width = field.shape[1]
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
336 start=total_width
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
337 total_width += width
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
338 fields[fieldnames[i]]=slice(start,total_width,1)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
339 # many complicated things remain to be done:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
340 # - find common dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
341 # - decide what to do with extra dimensions if not the same in all fields
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
342 # - try to see if we can avoid the copy?
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
343
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
344 class ArrayDataSet(FiniteLengthDataSet,FiniteWidthDataSet,SliceableDataSet):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
345 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
346 An ArrayDataSet behaves like a numpy array but adds the notion of named fields
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
347 from DataSet (and the ability to view the values of multiple fields as an 'Example').
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
348 It is a fixed-length and fixed-width dataset
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
349 in which each element is a fixed dimension numpy array or a number, hence the whole
9
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
350 dataset corresponds to a numpy array. Fields
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
351 must correspond to a slice of array columns. If the dataset has fields,
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
352 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array.
9
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
353 Any dataset can also be converted to a numpy array (losing the notion of fields
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
354 by the numpy.array(dataset) call.
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
355 """
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
356
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
357 class Iterator(LookupList):
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
358 """An iterator over a finite dataset that implements wrap-around"""
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
359 def __init__(self, dataset, fieldnames, minibatch_size, next_max):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
360 if fieldnames is None: fieldnames = dataset.fieldNames()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
361 LookupList.__init__(self, fieldnames, [0]*len(fieldnames))
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
362 self.dataset=dataset
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
363 self.minibatch_size=minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
364 self.next_count = 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
365 self.next_max = next_max
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
366 self.current = -self.minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
367 assert minibatch_size > 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
368 if minibatch_size >= len(dataset):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
369 raise NotImplementedError()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
370
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
371 def __iter__(self): #makes for loop work
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
372 return self
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
373
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
374 @staticmethod
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
375 def matcat(a, b):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
376 a0, a1 = a.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
377 b0, b1 = b.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
378 assert a1 == b1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
379 assert a.dtype is b.dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
380 rval = numpy.empty( (a0 + b0, a1), dtype=a.dtype)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
381 rval[:a0,:] = a
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
382 rval[a0:,:] = b
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
383 return rval
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
384
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
385 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
386 n_rows = self.dataset.data.shape[0]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
387 next_i = self.current+self.minibatch_size
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
388 if next_i >= n_rows:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
389 next_i -= n_rows
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
390 return next_i
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
391
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
392 def next(self):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
393
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
394 #check for end-of-loop
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
395 self.next_count += 1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
396 if self.next_count == self.next_max:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
397 raise StopIteration
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
398
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
399 #determine the first and last elements of the slice we'll return
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
400 n_rows = self.dataset.data.shape[0]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
401 self.current = self.next_index()
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
402 upper = self.current + self.minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
403
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
404 data = self.dataset.data
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
405
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
406 if upper <= n_rows:
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
407 #this is the easy case, we only need once slice
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
408 dataview = data[self.current:upper]
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
409 else:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
410 # the minibatch wraps around the end of the dataset
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
411 dataview = data[self.current:]
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
412 upper -= n_rows
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
413 assert upper > 0
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
414 dataview = self.matcat(dataview, data[:upper])
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
415
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
416 self._values = [dataview[:, self.dataset.fields[f]]\
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
417 for f in self._names]
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
418 return self
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
419
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
420
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
421 def __init__(self, data, fields=None):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
422 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
423 There are two ways to construct an ArrayDataSet: (1) from an
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
424 existing dataset (which may result in a copy of the data in a numpy array),
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
425 or (2) from a numpy.array (the data argument), along with an optional description
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
426 of the fields (a LookupList of column slices indexed by field names).
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
427 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
428 self.data=data
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
429 self.fields=fields
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
430 rows, cols = data.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
431
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
432 if fields:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
433 for fieldname,fieldslice in fields.items():
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
434 # make sure fieldslice.start and fieldslice.step are defined
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
435 start=fieldslice.start
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
436 step=fieldslice.step
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
437 if not start:
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
438 start=0
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
439 if not step:
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
440 step=1
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
441 if not fieldslice.start or not fieldslice.step:
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
442 fields[fieldname] = fieldslice = slice(start,fieldslice.stop,step)
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
443 # and coherent with the data array
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
444 assert fieldslice.start >= 0 and fieldslice.stop <= cols
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
445
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
446 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
447 fieldnames = DataSet.minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
448 minibatch_size = DataSet.minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
449 n_batches = DataSet.minibatches_n_batches):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
450 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
451 If the fieldnames list is None, it means that we want to see ALL the fields.
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
452
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
453 If the n_batches is None, we want to see all the examples possible
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
454 for the given minibatch_size (possibly missing some near the end).
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
455 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
456 # substitute the defaults:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
457 if n_batches is None: n_batches = len(self) / minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
458 return ArrayDataSet.Iterator(self, fieldnames, minibatch_size, n_batches)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
459
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
460 def __getattr__(self,fieldname):
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
461 """
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
462 Return a numpy array with the content associated with the given field name.
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
463 If this is a one-example dataset, then a row, i.e., numpy array (of one less dimension
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
464 than the dataset itself) is returned.
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
465 """
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
466 if len(self.data)==1:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
467 return self.data[0,self.fields[fieldname]]
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
468 return self.data[:,self.fields[fieldname]]
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
469
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
470 def __call__(self,*fieldnames):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
471 """Return a sub-dataset containing only the given fieldnames as fields."""
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
472 min_col=self.data.shape[1]
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
473 max_col=0
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
474 for field_slice in self.fields.values():
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
475 min_col=min(min_col,field_slice.start)
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
476 max_col=max(max_col,field_slice.stop)
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
477 new_fields=LookupList()
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
478 for fieldname,fieldslice in self.fields.items():
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
479 new_fields[fieldname]=slice(fieldslice.start-min_col,fieldslice.stop-min_col,fieldslice.step)
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
480 return ArrayDataSet(self.data[:,min_col:max_col],fields=new_fields)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
481
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
482 def fieldNames(self):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
483 """Return the list of field names that are supported by getattr and hasField."""
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
484 return self.fields.keys()
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
485
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
486 def __len__(self):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
487 """len(dataset) returns the number of examples in the dataset."""
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
488 return len(self.data)
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
489
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
490 def __getitem__(self,i):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
491 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
492 dataset[i] returns the (i+1)-th Example of the dataset. If there are no fields
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
493 the result is just a numpy array (for the i-th row of the dataset data matrix).
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
494 """
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
495 if self.fields:
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
496 fieldnames,fieldslices=zip(*self.fields.items())
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
497 return Example(self.fields.keys(),[self.data[i,fieldslice] for fieldslice in self.fields.values()])
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
498 else:
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
499 return self.data[i]
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
500
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
501 def __getslice__(self,*args):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
502 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
503 return ArrayDataSet(self.data.__getslice__(*args), fields=self.fields)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
504
8
d1c394486037 Replaced asarray() method by __array__ method which gets called automatically when
bengioy@bengiomac.local
parents: 7
diff changeset
505 def __array__(self):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
506 """Return a view of this dataset which is an numpy.ndarray (i.e. losing
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
507 the identity and name of fields within the dataset).
15
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
508
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
509 Numpy uses this special function name to retrieve an ndarray view for
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
510 function such as numpy.sum, numpy.dot, numpy.asarray, etc.
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
511
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
512 If this dataset has no fields, then we simply return self.data,
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
513 otherwise things are complicated.
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
514 - why do we want this behaviour when there are fields? (JB)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
515 - for convenience and completeness (but maybe it would make
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
516 more sense to implement this through a 'field-merging'
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
517 dataset). (YB)
15
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
518 """
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
519 if not self.fields:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
520 return self.data
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
521 # else, select subsets of columns mapped by the fields
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
522 columns_used = numpy.zeros((self.data.shape[1]),dtype=bool)
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
523 overlapping_fields = False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
524 n_columns = 0
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
525 for field_slice in self.fields.values():
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
526 for c in xrange(field_slice.start,field_slice.stop,field_slice.step):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
527 n_columns += 1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
528 if columns_used[c]: overlapping_fields=True
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
529 columns_used[c]=True
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
530 # try to figure out if we can map all the slices into one slice:
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
531 mappable_to_one_slice = not overlapping_fields
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
532 if not overlapping_fields:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
533 start=0
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
534 while start<len(columns_used) and not columns_used[start]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
535 start+=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
536 stop=len(columns_used)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
537 while stop>0 and not columns_used[stop-1]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
538 stop-=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
539 step=0
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
540 i=start
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
541 while i<stop:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
542 j=i+1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
543 while j<stop and not columns_used[j]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
544 j+=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
545 if step:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
546 if step!=j-i:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
547 mappable_to_one_slice = False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
548 break
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
549 else:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
550 step = j-i
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
551 i=j
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
552 if mappable_to_one_slice:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
553 return self.data[:,slice(start,stop,step)]
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
554 # else make contiguous copy (copying the overlapping columns)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
555 result = numpy.zeros((len(self.data),n_columns)+self.data.shape[2:],self.data.dtype)
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
556 c=0
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
557 for field_slice in self.fields.values():
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
558 slice_width=(field_slice.stop-field_slice.start)/field_slice.step
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
559 # copy the field here
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
560 result[:,slice(c,c+slice_width)]=self.data[:,field_slice]
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
561 c+=slice_width
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
562 return result
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
563
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
564 class ApplyFunctionDataSet(DataSet):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
565 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
566 A dataset that contains as fields the results of applying
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
567 a given function (example-wise) to specified input_fields of a source
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
568 dataset. The function should return a sequence whose elements will be stored in
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
569 fields whose names are given in the output_fields list. If copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
570 is True then the resulting dataset will also contain the fields of the source.
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
571 dataset. If accept_minibatches, then the function expects
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
572 minibatches as arguments (what is returned by the minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
573 iterator). In any case, the computations may be delayed until the examples
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
574 of self are requested. If cache is True, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
575 once the output fields for some examples have been computed, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
576 are cached (to avoid recomputation if the same examples are again requested).
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
577 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
578 def __init__(src,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True, compute_now=False):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
579 DataSet.__init__(self)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
580 self.src=src
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
581 self.function=function
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
582 assert src.hasFields(input_fields)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
583 self.input_fields=input_fields
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
584 self.output_fields=output_fields
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
585 assert not (copy_inputs and compute_now and not hasattr(src,'fieldNames'))
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
586 self.copy_inputs=copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
587 self.accept_minibatches=accept_minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
588 self.cache=cache
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
589 self.compute_now=compute_now
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
590 if compute_now:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
591 assert hasattr(src,'__len__') and len(src)>=0
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
592 fieldnames = output_fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
593 if copy_inputs: fieldnames = src.fieldNames() + output_fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
594 if accept_minibatches:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
595 # make a single minibatch with all the inputs
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
596 inputs = src.minibatches(input_fields,len(src)).next()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
597 # and apply the function to it, and transpose into a list of examples (field values, actually)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
598 self.cached_examples = zip(*Example(output_fields,function(*inputs)))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
599 else:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
600 # compute a list with one tuple per example, with the function outputs
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
601 self.cached_examples = [ function(input) for input in src.zip(input_fields) ]
26
672fe4b23032 Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents: 23
diff changeset
602 elif cache:
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
603 # maybe a fixed-size array kind of structure would be more efficient than a list
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
604 # in the case where src is FiniteDataSet. -YB
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
605 self.cached_examples = []
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
606
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
607 def minibatches(self,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
608 fieldnames = DataSet.minibatches_fieldnames,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
609 minibatch_size = DataSet.minibatches_minibatch_size,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
610 n_batches = DataSet.minibatches_n_batches):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
611
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
612 class Iterator(LookupList):
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
613
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
614 def __init__(self,dataset):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
615 if fieldnames is None:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
616 assert hasattr(dataset,"fieldNames")
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
617 fieldnames = dataset.fieldNames()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
618 self.example_index=0
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
619 LookupList.__init__(self, fieldnames, [0]*len(fieldnames))
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
620 self.dataset=dataset
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
621 self.src_iterator=self.src.minibatches(list(set.union(set(fieldnames),set(dataset.input_fields))),
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
622 minibatch_size,n_batches)
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
623 self.fieldnames_not_in_input = []
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
624 if self.copy_inputs:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
625 self.fieldnames_not_in_input = filter(lambda x: not x in dataset.input_fields, fieldnames)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
626
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
627 def __iter__(self):
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
628 return self
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
629
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
630 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
631 return self.src_iterator.next_index()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
632
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
633 def next(self):
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
634 example_index = self.src_iterator.next_index()
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
635 src_examples = self.src_iterator.next()
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
636 if self.dataset.copy_inputs:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
637 function_inputs = [src_examples[field_name] for field_name in self.dataset.input_fields]
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
638 else:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
639 function_inputs = src_examples
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
640 if self.dataset.cached_examples:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
641 cache_len=len(self.cached_examples)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
642 if example_index<cache_len+minibatch_size:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
643 outputs_list = self.cached_examples[example_index:example_index+minibatch_size]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
644 # convert the minibatch list of examples
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
645 # into a list of fields each of which iterate over the minibatch
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
646 outputs = zip(*outputs_list)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
647 else:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
648 outputs = self.dataset.function(*function_inputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
649 if self.dataset.cache:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
650 # convert the list of fields, each of which can iterate over the minibatch
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
651 # into a list of examples in the minibatch (each of which is a list of field values)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
652 outputs_list = zip(*outputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
653 # copy the outputs_list into the cache
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
654 for i in xrange(cache_len,example_index):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
655 self.cached_examples.append(None)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
656 self.cached_examples += outputs_list
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
657 else:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
658 outputs = self.dataset.function(*function_inputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
659
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
660 return Example(self.fieldnames_not_in_input+self.dataset.output_fields,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
661 [src_examples[field_name] for field_name in self.fieldnames_not_in_input]+outputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
662
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
663
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
664 for fieldname in fieldnames:
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
665 assert fieldname in self.output_fields or self.src.hasFields(fieldname)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
666 return Iterator(self)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
667
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
668
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
669 def supervised_learning_dataset(src_dataset,input_fields,target_fields,weight_field=None):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
670 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
671 Wraps an arbitrary DataSet into one for supervised learning tasks by forcing the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
672 user to define a set of fields as the 'input' field and a set of fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
673 as the 'target' field. Optionally, a single weight_field can also be defined.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
674 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
675 args = ((input_fields,'input'),(output_fields,'target'))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
676 if weight_field: args+=(([weight_field],'weight'))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
677 return src_dataset.rename(*args)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
678
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
679
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
680
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
681