annotate dataset.py @ 23:526e192b0699

Working on ApplyFunctionDataSet, added constraint that DataSet iterators must have a next_index() method.
author bengioy@esprit.iro.umontreal.ca
date Wed, 09 Apr 2008 18:27:13 -0400
parents b6b36f65664f
children 672fe4b23032
rev   line source
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
1
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
2 from lookup_list import LookupList
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
3 Example = LookupList
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
4
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
5 class AbstractFunction (Exception): """Derived class must override this function"""
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
6
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
7 class DataSet(object):
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
8 """A virtual base class for datasets.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
9
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
10 A DataSet is a generator of iterators; these iterators can run through the
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
11 examples in a variety of ways. A DataSet need not necessarily have a finite
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
12 or known length, so this class can be used to interface to a 'stream' which
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
13 feeds on-line learning.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
14
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
15 To iterate over examples, there are several possibilities:
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
16 - for example in dataset.zip([field1, field2,field3, ...])
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
17 - for val1,val2,val3 in dataset.zip([field1, field2,field3])
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
18 - for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
19 - for example in dataset
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
20 Each of these is documented below. All of these iterators are expected
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
21 to provide, in addition to the usual 'next()' method, a 'next_index()' method
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
22 which returns a non-negative integer pointing to the position of the next
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
23 example that will be returned by 'next()' (or of the first example in the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
24 next minibatch returned). This is important because these iterators
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
25 can wrap around the dataset in order to do multiple passes through it,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
26 in possibly unregular ways if the minibatch size is not a divisor of the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
27 dataset length.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
28
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
29 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
30
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
31 Note: The content of a field can be of any type.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
32
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
33 Note: A dataset can recognize a potentially infinite number of field names (i.e. the field
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
34 values can be computed on-demand, when particular field names are used in one of the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
35 iterators).
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
36
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
37 Datasets of finite length should be sub-classes of FiniteLengthDataSet.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
38
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
39 Datasets whose elements can be indexed and sub-datasets of consecutive
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
40 examples (i.e. slices) can be extracted from should be sub-classes of
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
41 SliceableDataSet.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
42
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
43 Datasets with a finite number of fields should be sub-classes of
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
44 FiniteWidthDataSet.
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
45 """
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
46
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
47 def __init__(self):
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
48 pass
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
49
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
50 class Iterator(LookupList):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
51 def __init__(self, ll):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
52 LookupList.__init__(self, ll.keys(), ll.values())
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
53 self.ll = ll
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
54 def __iter__(self): #makes for loop work
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
55 return self
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
56 def next(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
57 self.ll.next()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
58 self._values = [v[0] for v in self.ll._values]
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
59 return self
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
60 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
61 return self.ll.next_index()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
62
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
63 def __iter__(self):
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
64 """Supports the syntax "for i in dataset: ..."
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
65
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
66 Using this syntax, "i" will be an Example instance (or equivalent) with
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
67 all the fields of DataSet self. Every field of "i" will give access to
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
68 a field of a single example. Fields should be accessible via
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
69 i["fielname"] or i[3] (in the order defined by the elements of the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
70 Example returned by this iterator), but the derived class is free
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
71 to accept any type of identifier, and add extra functionality to the iterator.
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
72 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
73 return DataSet.Iterator(self.minibatches(None, minibatch_size = 1))
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
74
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
75 def zip(self, *fieldnames):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
76 """
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
77 Supports two forms of syntax:
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
78
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
79 for i in dataset.zip([f1, f2, f3]): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
80
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
81 for i1, i2, i3 in dataset.zip([f1, f2, f3]): ...
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
82
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
83 Using the first syntax, "i" will be an indexable object, such as a list,
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
84 tuple, or Example instance, such that on every iteration, i[0] is the f1
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
85 field of the current example, i[1] is the f2 field, and so on.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
86
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
87 Using the second syntax, i1, i2, i3 will contain the the contents of the
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
88 f1, f2, and f3 fields of a single example on each loop iteration.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
89
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
90 The derived class may accept fieldname arguments of any type.
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
91
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
92 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
93 return DataSet.Iterator(self.minibatches(fieldnames, minibatch_size = 1))
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
94
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
95 minibatches_fieldnames = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
96 minibatches_minibatch_size = 1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
97 minibatches_n_batches = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
98 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
99 fieldnames = minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
100 minibatch_size = minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
101 n_batches = minibatches_n_batches):
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
102 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
103 Supports three forms of syntax:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
104
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
105 for i in dataset.minibatches(None,**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
106
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
107 for i in dataset.minibatches([f1, f2, f3],**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
108
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
109 for i1, i2, i3 in dataset.minibatches([f1, f2, f3],**kwargs): ...
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
110
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
111 Using the first two syntaxes, "i" will be an indexable object, such as a list,
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
112 tuple, or Example instance. In both cases, i[k] is a list-like container
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
113 of a batch of current examples. In the second case, i[0] is
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
114 list-like container of the f1 field of a batch current examples, i[1] is
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
115 a list-like container of the f2 field, etc.
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
116
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
117 Using the first syntax, all the fields will be returned in "i".
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
118 Beware that some datasets may not support this syntax, if the number
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
119 of fields is infinite (i.e. field values may be computed "on demand").
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
120
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
121 Using the third syntax, i1, i2, i3 will be list-like containers of the
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
122 f1, f2, and f3 fields of a batch of examples on each loop iteration.
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
123
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
124 PARAMETERS
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
125 - fieldnames (list of any type, default None):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
126 The loop variables i1, i2, i3 (in the example above) should contain the
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
127 f1, f2, and f3 fields of the current batch of examples. If None, the
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
128 derived class can choose a default, e.g. all fields.
16
813723310d75 commenting
bergstrj@iro.umontreal.ca
parents: 15 11
diff changeset
129
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
130 - minibatch_size (integer, default 1)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
131 On every iteration, the variables i1, i2, i3 will have
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
132 exactly minibatch_size elements. e.g. len(i1) == minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
133
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
134 - n_batches (integer, default None)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
135 The iterator will loop exactly this many times, and then stop. If None,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
136 the derived class can choose a default. If (-1), then the returned
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
137 iterator should support looping indefinitely.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
138
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
139 Note: A list-like container is something like a tuple, list, numpy.ndarray or
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
140 any other object that supports integer indexing and slicing.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
141
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
142 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
143 raise AbstractFunction()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
144
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
145 def hasFields(*fieldnames):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
146 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
147 Return true if the given field name (or field names, if multiple arguments are
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
148 given) is recognized by the DataSet (i.e. can be used as a field name in one
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
149 of the iterators).
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
150 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
151 raise AbstractFunction()
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
152
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
153 def merge_fields(*specifications):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
154 """
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
155 Return a new dataset that maps old fields (of self) to new fields (of the returned
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
156 dataset). The minimal syntax that should be supported is the following:
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
157 new_field_specifications = [new_field_spec1, new_field_spec2, ...]
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
158 new_field_spec = ([old_field1, old_field2, ...], new_field)
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
159 In general both old_field and new_field should be strings, but some datasets may also
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
160 support additional indexing schemes within each field (e.g. column slice
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
161 of a matrix-like field).
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
162 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
163 raise AbstractFunction()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
164
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
165 def merge_field_values(*field_value_pairs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
166 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
167 Return the value that corresponds to merging the values of several fields,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
168 given as arguments (field_name, field_value) pairs with self.hasField(field_name).
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
169 This may be used by implementations of merge_fields.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
170 Raise a ValueError if the operation is not possible.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
171 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
172 fieldnames,fieldvalues = zip(*field_value_pairs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
173 raise ValueError("Unable to merge values of these fields:"+repr(fieldnames))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
174
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
175 def examples2minibatch(examples):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
176 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
177 Combine a list of Examples into a minibatch. A minibatch is an Example whose fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
178 are iterable over the examples of the minibatch.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
179 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
180 raise AbstractFunction()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
181
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
182 def rename(rename_dict):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
183 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
184 Return a new dataset that renames fields, using a dictionnary that maps old field
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
185 names to new field names. The only fields visible by the returned dataset are those
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
186 whose names are keys of the rename_dict.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
187 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
188 return RenamingDataSet(self,rename_dict)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
189
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
190 def applyFunction(function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
191 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
192 Return a dataset that contains as fields the results of applying
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
193 the given function (example-wise) to the specified input_fields. The
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
194 function should return a sequence whose elements will be stored in
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
195 fields whose names are given in the output_fields list. If copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
196 is True then the resulting dataset will also contain the fields of self.
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
197 If accept_minibatches, then the function may be called
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
198 with minibatches as arguments (what is returned by the minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
199 iterator). In any case, the computations may be delayed until the examples
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
200 of the resulting dataset are requested. If cache is True, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
201 once the output fields for some examples have been computed, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
202 are cached (to avoid recomputation if the same examples are again
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
203 requested).
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
204 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
205 return ApplyFunctionDataSet(function, input_fields, output_fields, copy_inputs, accept_minibatches, cache)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
206
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
207 class RenamingDataSet(DataSet):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
208 """A DataSet that wraps another one, and makes it look like the field names
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
209 are different
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
210
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
211 Renaming is done by a dictionary that maps new names to the old ones used in
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
212 self.src.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
213 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
214 def __init__(self, src, rename_dct):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
215 DataSet.__init__(self)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
216 self.src = src
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
217 self.rename_dct = copy.copy(rename_dct)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
218
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
219 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
220 fieldnames = DataSet.minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
221 minibatch_size = DataSet.minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
222 n_batches = DataSet.minibatches_n_batches):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
223 dct = self.rename_dct
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
224 new_fieldnames = [dct.get(f, f) for f in fieldnames]
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
225 return self.src.minibatches(new_fieldnames, minibatches_size, n_batches)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
226
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
227 class FiniteLengthDataSet(DataSet):
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
228 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
229 Virtual interface for datasets that have a finite length (number of examples),
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
230 and thus recognize a len(dataset) call.
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
231 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
232 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
233 DataSet.__init__(self)
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
234
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
235 def __len__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
236 """len(dataset) returns the number of examples in the dataset."""
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
237 raise AbstractFunction()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
238
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
239
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
240 class SliceableDataSet(DataSet):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
241 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
242 Virtual interface, a subclass of DataSet for datasets which are sliceable
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
243 and whose individual elements can be accessed, generally respecting the
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
244 python semantics for [spec], where spec is either a non-negative integer
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
245 (for selecting one example), or a python slice (for selecting a sub-dataset
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
246 comprising the specified examples). This is useful for obtaining
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
247 sub-datasets, e.g. for splitting a dataset into training and test sets.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
248 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
249 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
250 DataSet.__init__(self)
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
251
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
252 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
253 fieldnames = DataSet.minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
254 minibatch_size = DataSet.minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
255 n_batches = DataSet.minibatches_n_batches):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
256 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
257 If the n_batches is empty, we want to see all the examples possible
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
258 for the given minibatch_size (possibly missing a few at the end of the dataset).
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
259 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
260 # substitute the defaults:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
261 if n_batches is None: n_batches = len(self) / minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
262 return DataSet.Iterator(self, fieldnames, minibatch_size, n_batches)
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
263
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
264 def __getitem__(self,i):
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
265 """dataset[i] returns the (i+1)-th example of the dataset."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
266 raise AbstractFunction()
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
267
2
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
268 def __getslice__(self,*slice_args):
3fddb1c8f955 Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents: 1
diff changeset
269 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
270 raise AbstractFunction()
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
271
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
272
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
273 class FiniteWidthDataSet(DataSet):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
274 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
275 Virtual interface for datasets that have a finite width (number of fields),
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
276 and thus return a list of fieldNames.
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
277 """
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
278 def __init__(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
279 DataSet.__init__(self)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
280
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
281 def hasFields(*fieldnames):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
282 has_fields=True
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
283 for fieldname in fieldnames:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
284 if fieldname not in self.fields.keys():
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
285 has_fields=False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
286 return has_fields
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
287
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
288 def fieldNames(self):
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
289 """Return the list of field names that are supported by the iterators,
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
290 and for which hasFields(fieldname) would return True."""
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
291 raise AbstractFunction()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
292
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
293
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
294 # we may want ArrayDataSet defined in another python file
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
295
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
296 import numpy
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
297
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
298 def as_array_dataset(dataset):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
299 # Generally datasets can be efficient by making data fields overlap, but
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
300 # this function doesn't know which fields overlap. So, it should check if
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
301 # dataset supports an as_array_dataset member function, and return that if
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
302 # possible.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
303 if hasattr(dataset, 'as_array_dataset'):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
304 return dataset.as_array_dataset()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
305
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
306 raise NotImplementedError
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
307
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
308 # Make ONE big minibatch with all the examples, to separate the fields.
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
309 n_examples = len(dataset)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
310 batch = dataset.minibatches( minibatch_size = len(dataset)).next()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
311
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
312 # Each field of the underlying dataset must be convertible to a numpy array of the same type
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
313 # currently just double, but should use the smallest compatible dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
314 n_fields = len(batch)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
315 fieldnames = batch.fields.keys()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
316 total_width = 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
317 type = None
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
318 fields = LookupList()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
319 for i in xrange(n_fields):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
320 field = array(batch[i])
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
321 assert field.shape[0]==n_examples
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
322 width = field.shape[1]
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
323 start=total_width
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
324 total_width += width
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
325 fields[fieldnames[i]]=slice(start,total_width,1)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
326 # many complicated things remain to be done:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
327 # - find common dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
328 # - decide what to do with extra dimensions if not the same in all fields
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
329 # - try to see if we can avoid the copy?
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
330
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
331 class ArrayDataSet(FiniteLengthDataSet,FiniteWidthDataSet,SliceableDataSet):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
332 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
333 An ArrayDataSet behaves like a numpy array but adds the notion of named fields
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
334 from DataSet (and the ability to view the values of multiple fields as an 'Example').
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
335 It is a fixed-length and fixed-width dataset
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
336 in which each element is a fixed dimension numpy array or a number, hence the whole
9
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
337 dataset corresponds to a numpy array. Fields
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
338 must correspond to a slice of array columns. If the dataset has fields,
6
d5738b79089a Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents: 5
diff changeset
339 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array.
9
de616c423dbd Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents: 8
diff changeset
340 Any dataset can also be converted to a numpy array (losing the notion of fields
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
341 by the numpy.array(dataset) call.
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
342 """
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
343
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
344 class Iterator(LookupList):
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
345 """An iterator over a finite dataset that implements wrap-around"""
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
346 def __init__(self, dataset, fieldnames, minibatch_size, next_max):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
347 if fieldnames is None: fieldnames = dataset.fieldNames()
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
348 LookupList.__init__(self, fieldnames, [0]*len(fieldnames))
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
349 self.dataset=dataset
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
350 self.minibatch_size=minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
351 self.next_count = 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
352 self.next_max = next_max
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
353 self.current = -self.minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
354 assert minibatch_size > 0
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
355 if minibatch_size >= len(dataset):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
356 raise NotImplementedError()
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
357
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
358 def __iter__(self): #makes for loop work
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
359 return self
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
360
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
361 @staticmethod
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
362 def matcat(a, b):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
363 a0, a1 = a.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
364 b0, b1 = b.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
365 assert a1 == b1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
366 assert a.dtype is b.dtype
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
367 rval = numpy.empty( (a0 + b0, a1), dtype=a.dtype)
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
368 rval[:a0,:] = a
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
369 rval[a0:,:] = b
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
370 return rval
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
371
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
372 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
373 n_rows = self.dataset.data.shape[0]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
374 next_i = self.current+self.minibatch_size
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
375 if next_i >= n_rows:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
376 next_i -= n_rows
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
377 return next_i
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
378
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
379 def next(self):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
380
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
381 #check for end-of-loop
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
382 self.next_count += 1
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
383 if self.next_count == self.next_max:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
384 raise StopIteration
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
385
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
386 #determine the first and last elements of the slice we'll return
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
387 n_rows = self.dataset.data.shape[0]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
388 self.current = self.next_index()
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
389 upper = self.current + self.minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
390
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
391 data = self.dataset.data
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
392
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
393 if upper <= n_rows:
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
394 #this is the easy case, we only need once slice
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
395 dataview = data[self.current:upper]
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
396 else:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
397 # the minibatch wraps around the end of the dataset
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
398 dataview = data[self.current:]
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
399 upper -= n_rows
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
400 assert upper > 0
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
401 dataview = self.matcat(dataview, data[:upper])
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
402
19
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
403 self._values = [dataview[:, self.dataset.fields[f]]\
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
404 for f in self._names]
57f4015e2e09 Iterators extend LookupList
bergstrj@iro.umontreal.ca
parents: 17
diff changeset
405 return self
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
406
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
407
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
408 def __init__(self, data, fields=None):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
409 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
410 There are two ways to construct an ArrayDataSet: (1) from an
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
411 existing dataset (which may result in a copy of the data in a numpy array),
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
412 or (2) from a numpy.array (the data argument), along with an optional description
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
413 of the fields (a LookupList of column slices indexed by field names).
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
414 """
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
415 self.data=data
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
416 self.fields=fields
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
417 rows, cols = data.shape
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
418
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
419 if fields:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
420 for fieldname,fieldslice in fields.items():
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
421 # make sure fieldslice.start and fieldslice.step are defined
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
422 start=fieldslice.start
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
423 step=fieldslice.step
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
424 if not start:
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
425 start=0
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
426 if not step:
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
427 step=1
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
428 if not fieldslice.start or not fieldslice.step:
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
429 fields[fieldname] = fieldslice = slice(start,fieldslice.stop,step)
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
430 # and coherent with the data array
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
431 assert fieldslice.start >= 0 and fieldslice.stop <= cols
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
432
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
433 def minibatches(self,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
434 fieldnames = DataSet.minibatches_fieldnames,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
435 minibatch_size = DataSet.minibatches_minibatch_size,
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
436 n_batches = DataSet.minibatches_n_batches):
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
437 """
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
438 If the fieldnames list is None, it means that we want to see ALL the fields.
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
439
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
440 If the n_batches is None, we want to see all the examples possible
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
441 for the given minibatch_size (possibly missing some near the end).
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
442 """
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
443 # substitute the defaults:
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
444 if n_batches is None: n_batches = len(self) / minibatch_size
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
445 return ArrayDataSet.Iterator(self, fieldnames, minibatch_size, n_batches)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
446
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
447 def __getattr__(self,fieldname):
4
f7dcfb5f9d5b Added test for dataset.
bengioy@bengiomac.local
parents: 3
diff changeset
448 """
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
449 Return a numpy array with the content associated with the given field name.
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
450 If this is a one-example dataset, then a row, i.e., numpy array (of one less dimension
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
451 than the dataset itself) is returned.
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
452 """
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
453 if len(self.data)==1:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
454 return self.data[0,self.fields[fieldname]]
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
455 return self.data[:,self.fields[fieldname]]
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
456
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
457 def __call__(self,*fieldnames):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
458 """Return a sub-dataset containing only the given fieldnames as fields."""
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
459 min_col=self.data.shape[1]
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
460 max_col=0
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
461 for field_slice in self.fields.values():
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
462 min_col=min(min_col,field_slice.start)
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
463 max_col=max(max_col,field_slice.stop)
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
464 new_fields=LookupList()
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
465 for fieldname,fieldslice in self.fields.items():
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
466 new_fields[fieldname]=slice(fieldslice.start-min_col,fieldslice.stop-min_col,fieldslice.step)
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
467 return ArrayDataSet(self.data[:,min_col:max_col],fields=new_fields)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
468
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
469 def fieldNames(self):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
470 """Return the list of field names that are supported by getattr and hasField."""
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
471 return self.fields.keys()
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
472
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
473 def __len__(self):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
474 """len(dataset) returns the number of examples in the dataset."""
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
475 return len(self.data)
1
2cd82666b9a7 Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents: 0
diff changeset
476
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
477 def __getitem__(self,i):
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
478 """
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
479 dataset[i] returns the (i+1)-th Example of the dataset. If there are no fields
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
480 the result is just a numpy array (for the i-th row of the dataset data matrix).
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
481 """
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
482 if self.fields:
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
483 fieldnames,fieldslices=zip(*self.fields.items())
12
ff4e551490f1 Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents: 11
diff changeset
484 return Example(self.fields.keys(),[self.data[i,fieldslice] for fieldslice in self.fields.values()])
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
485 else:
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
486 return self.data[i]
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
487
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
488 def __getslice__(self,*args):
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
489 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1."""
17
759d17112b23 more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
parents: 16 12
diff changeset
490 return ArrayDataSet(self.data.__getslice__(*args), fields=self.fields)
3
378b68d5c4ad Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents: 2
diff changeset
491
8
d1c394486037 Replaced asarray() method by __array__ method which gets called automatically when
bengioy@bengiomac.local
parents: 7
diff changeset
492 def __array__(self):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
493 """Return a view of this dataset which is an numpy.ndarray (i.e. losing
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
494 the identity and name of fields within the dataset).
15
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
495
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
496 Numpy uses this special function name to retrieve an ndarray view for
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
497 function such as numpy.sum, numpy.dot, numpy.asarray, etc.
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
498
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
499 If this dataset has no fields, then we simply return self.data,
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
500 otherwise things are complicated.
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
501 - why do we want this behaviour when there are fields? (JB)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
502 - for convenience and completeness (but maybe it would make
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
503 more sense to implement this through a 'field-merging'
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
504 dataset). (YB)
15
88168361a5ab comment re: ArrayDataSet.__array__
bergstrj@iro.umontreal.ca
parents: 9
diff changeset
505 """
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
506 if not self.fields:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
507 return self.data
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
508 # else, select subsets of columns mapped by the fields
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
509 columns_used = numpy.zeros((self.data.shape[1]),dtype=bool)
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
510 overlapping_fields = False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
511 n_columns = 0
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
512 for field_slice in self.fields.values():
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
513 for c in xrange(field_slice.start,field_slice.stop,field_slice.step):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
514 n_columns += 1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
515 if columns_used[c]: overlapping_fields=True
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
516 columns_used[c]=True
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
517 # try to figure out if we can map all the slices into one slice:
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
518 mappable_to_one_slice = not overlapping_fields
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
519 if not overlapping_fields:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
520 start=0
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
521 while start<len(columns_used) and not columns_used[start]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
522 start+=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
523 stop=len(columns_used)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
524 while stop>0 and not columns_used[stop-1]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
525 stop-=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
526 step=0
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
527 i=start
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
528 while i<stop:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
529 j=i+1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
530 while j<stop and not columns_used[j]:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
531 j+=1
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
532 if step:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
533 if step!=j-i:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
534 mappable_to_one_slice = False
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
535 break
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
536 else:
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
537 step = j-i
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
538 i=j
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
539 if mappable_to_one_slice:
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
540 return self.data[:,slice(start,stop,step)]
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
541 # else make contiguous copy (copying the overlapping columns)
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
542 result = numpy.zeros((len(self.data),n_columns)+self.data.shape[2:],self.data.dtype)
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
543 c=0
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
544 for field_slice in self.fields.values():
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
545 slice_width=(field_slice.stop-field_slice.start)/field_slice.step
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
546 # copy the field here
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
547 result[:,slice(c,c+slice_width)]=self.data[:,field_slice]
7
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
548 c+=slice_width
6f8f338686db Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents: 6
diff changeset
549 return result
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
550
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
551 def rename(*new_field_specifications):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
552 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
553 Return a new dataset that maps old fields (of self) to new fields (of the returned
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
554 dataset). The minimal syntax that should be supported is the following:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
555 new_field_specifications = [new_field_spec1, new_field_spec2, ...]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
556 new_field_spec = ([old_field1, old_field2, ...], new_field)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
557 In general both old_field and new_field should be strings, but some datasets may also
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
558 support additional indexing schemes within each field (e.g. column slice
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
559 of a matrix-like field).
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
560 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
561 # if all old fields of each spec are
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
562 raise NotImplementedError()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
563
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
564 class ApplyFunctionDataSet(DataSet):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
565 """
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
566 A dataset that contains as fields the results of applying
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
567 a given function (example-wise) to specified input_fields of a source
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
568 dataset. The function should return a sequence whose elements will be stored in
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
569 fields whose names are given in the output_fields list. If copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
570 is True then the resulting dataset will also contain the fields of the source.
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
571 dataset. If accept_minibatches, then the function expects
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
572 minibatches as arguments (what is returned by the minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
573 iterator). In any case, the computations may be delayed until the examples
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
574 of self are requested. If cache is True, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
575 once the output fields for some examples have been computed, then
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
576 are cached (to avoid recomputation if the same examples are again requested).
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
577 """
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
578 def __init__(src,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True, compute_now=False):
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
579 DataSet.__init__(self)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
580 self.src=src
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
581 self.function=function
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
582 assert src.hasFields(input_fields)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
583 self.input_fields=input_fields
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
584 self.output_fields=output_fields
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
585 assert not (copy_inputs and compute_now and not hasattr(src,'fieldNames'))
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
586 self.copy_inputs=copy_inputs
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
587 self.accept_minibatches=accept_minibatches
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
588 self.cache=cache
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
589 self.compute_now=compute_now
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
590 if compute_now:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
591 assert hasattr(src,'__len__') and len(src)>=0
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
592 fieldnames = output_fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
593 if copy_inputs: fieldnames = src.fieldNames() + output_fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
594 if accept_minibatches:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
595 # make a single minibatch with all the inputs
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
596 inputs = src.minibatches(input_fields,len(src)).next()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
597 # and apply the function to it, and transpose into a list of examples (field values, actually)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
598 self.cached_examples = zip(*Example(output_fields,function(*inputs)))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
599 else:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
600 # compute a list with one tuple per example, with the function outputs
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
601 self.cached_examples = [ function(input) for input in src.zip(input_fields) ]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
602 else if cache:
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
603 # maybe a fixed-size array kind of structure would be more efficient than a list
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
604 # in the case where src is FiniteDataSet. -YB
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
605 self.cached_examples = []
11
be128b9127c8 Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents: 9
diff changeset
606
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
607 def minibatches(self,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
608 fieldnames = DataSet.minibatches_fieldnames,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
609 minibatch_size = DataSet.minibatches_minibatch_size,
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
610 n_batches = DataSet.minibatches_n_batches):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
611
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
612 class Iterator(LookupList):
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
613
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
614 def __init__(self,dataset):
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
615 if fieldnames is None:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
616 assert hasattr(dataset,"fieldNames")
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
617 fieldnames = dataset.fieldNames()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
618 self.example_index=0
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
619 LookupList.__init__(self, fieldnames, [0]*len(fieldnames))
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
620 self.dataset=dataset
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
621 self.src_iterator=self.src.minibatches(list(set.union(set(fieldnames),set(dataset.input_fields))),
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
622 minibatch_size,n_batches)
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
623 self.fieldnames_not_in_input = []
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
624 if self.copy_inputs:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
625 self.fieldnames_not_in_input = filter(lambda x: not x in dataset.input_fields, fieldnames)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
626
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
627 def __iter__(self):
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
628 return self
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
629
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
630 def next_index(self):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
631 return self.src_iterator.next_index()
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
632
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
633 def next(self):
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
634 example_index = self.src_iterator.next_index()
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
635 src_examples = self.src_iterator.next()
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
636 if self.dataset.copy_inputs:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
637 function_inputs = [src_examples[field_name] for field_name in self.dataset.input_fields]
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
638 else:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
639 function_inputs = src_examples
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
640 if self.dataset.cached_examples:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
641 cache_len=len(self.cached_examples)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
642 if example_index<cache_len+minibatch_size:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
643 outputs_list = self.cached_examples[example_index:example_index+minibatch_size]
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
644 # convert the minibatch list of examples
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
645 # into a list of fields each of which iterate over the minibatch
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
646 outputs = zip(*outputs_list)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
647 else:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
648 outputs = self.dataset.function(*function_inputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
649 if self.dataset.cache:
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
650 # convert the list of fields, each of which can iterate over the minibatch
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
651 # into a list of examples in the minibatch (each of which is a list of field values)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
652 outputs_list = zip(*outputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
653 # copy the outputs_list into the cache
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
654 for i in xrange(cache_len,example_index):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
655 self.cached_examples.append(None)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
656 self.cached_examples += outputs_list
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
657 else:
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
658 outputs = self.dataset.function(*function_inputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
659
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
660 return Example(self.fieldnames_not_in_input+self.dataset.output_fields,
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
661 [src_examples[field_name] for field_name in self.fieldnames_not_in_input]+outputs)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
662
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
663
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
664 for fieldname in fieldnames:
22
b6b36f65664f Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents: 20
diff changeset
665 assert fieldname in self.output_fields or self.src.hasFields(fieldname)
20
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
666 return Iterator(self)
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
667
266c68cb6136 Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents: 19
diff changeset
668
23
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
669 def supervised_learning_dataset(src_dataset,input_fields,target_fields,weight_field=None):
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
670 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
671 Wraps an arbitrary DataSet into one for supervised learning tasks by forcing the
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
672 user to define a set of fields as the 'input' field and a set of fields
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
673 as the 'target' field. Optionally, a single weight_field can also be defined.
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
674 """
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
675 args = ((input_fields,'input'),(output_fields,'target'))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
676 if weight_field: args+=(([weight_field],'weight'))
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
677 return src_dataset.rename(*args)
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
678
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
679
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
680
526e192b0699 Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents: 22
diff changeset
681