Mercurial > pylearn
annotate dataset.py @ 26:672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
author | bengioy@grenat.iro.umontreal.ca |
---|---|
date | Fri, 11 Apr 2008 11:14:54 -0400 |
parents | 526e192b0699 |
children | 541a273bc89f |
rev | line source |
---|---|
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
1 |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
2 from lookup_list import LookupList |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
3 Example = LookupList |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
4 import copy |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
5 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
6 class AbstractFunction (Exception): """Derived class must override this function""" |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
7 |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
8 class DataSet(object): |
16 | 9 """A virtual base class for datasets. |
10 | |
11 A DataSet is a generator of iterators; these iterators can run through the | |
12 examples in a variety of ways. A DataSet need not necessarily have a finite | |
13 or known length, so this class can be used to interface to a 'stream' which | |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
14 feeds on-line learning. |
16 | 15 |
16 To iterate over examples, there are several possibilities: | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
17 - for example in dataset.zip([field1, field2,field3, ...]) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
18 - for val1,val2,val3 in dataset.zip([field1, field2,field3]) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
19 - for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
20 - for example in dataset |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
21 Each of these is documented below. All of these iterators are expected |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
22 to provide, in addition to the usual 'next()' method, a 'next_index()' method |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
23 which returns a non-negative integer pointing to the position of the next |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
24 example that will be returned by 'next()' (or of the first example in the |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
25 next minibatch returned). This is important because these iterators |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
26 can wrap around the dataset in order to do multiple passes through it, |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
27 in possibly unregular ways if the minibatch size is not a divisor of the |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
28 dataset length. |
16 | 29 |
30 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. | |
31 | |
32 Note: The content of a field can be of any type. | |
33 | |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
34 Note: A dataset can recognize a potentially infinite number of field names (i.e. the field |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
35 values can be computed on-demand, when particular field names are used in one of the |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
36 iterators). |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
37 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
38 Datasets of finite length should be sub-classes of FiniteLengthDataSet. |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
39 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
40 Datasets whose elements can be indexed and sub-datasets of consecutive |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
41 examples (i.e. slices) can be extracted from should be sub-classes of |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
42 SliceableDataSet. |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
43 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
44 Datasets with a finite number of fields should be sub-classes of |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
45 FiniteWidthDataSet. |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
46 """ |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
47 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
48 def __init__(self): |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
49 pass |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
50 |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
51 class Iterator(LookupList): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
52 def __init__(self, ll): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
53 LookupList.__init__(self, ll.keys(), ll.values()) |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
54 self.ll = ll |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
55 def __iter__(self): #makes for loop work |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
56 return self |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
57 def next(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
58 self.ll.next() |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
59 self._values = [v[0] for v in self.ll._values] |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
60 return self |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
61 def next_index(self): |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
62 return self.ll.next_index() |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
63 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
64 def __iter__(self): |
16 | 65 """Supports the syntax "for i in dataset: ..." |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
66 |
16 | 67 Using this syntax, "i" will be an Example instance (or equivalent) with |
68 all the fields of DataSet self. Every field of "i" will give access to | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
69 a field of a single example. Fields should be accessible via |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
70 i["fielname"] or i[3] (in the order defined by the elements of the |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
71 Example returned by this iterator), but the derived class is free |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
72 to accept any type of identifier, and add extra functionality to the iterator. |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
73 """ |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
74 return DataSet.Iterator(self.minibatches(None, minibatch_size = 1)) |
16 | 75 |
76 def zip(self, *fieldnames): | |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
77 """ |
16 | 78 Supports two forms of syntax: |
79 | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
80 for i in dataset.zip([f1, f2, f3]): ... |
16 | 81 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
82 for i1, i2, i3 in dataset.zip([f1, f2, f3]): ... |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
83 |
16 | 84 Using the first syntax, "i" will be an indexable object, such as a list, |
85 tuple, or Example instance, such that on every iteration, i[0] is the f1 | |
86 field of the current example, i[1] is the f2 field, and so on. | |
87 | |
88 Using the second syntax, i1, i2, i3 will contain the the contents of the | |
89 f1, f2, and f3 fields of a single example on each loop iteration. | |
90 | |
91 The derived class may accept fieldname arguments of any type. | |
92 | |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
93 """ |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
94 return DataSet.Iterator(self.minibatches(fieldnames, minibatch_size = 1)) |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
95 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
96 minibatches_fieldnames = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
97 minibatches_minibatch_size = 1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
98 minibatches_n_batches = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
99 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
100 fieldnames = minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
101 minibatch_size = minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
102 n_batches = minibatches_n_batches): |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
103 """ |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
104 Supports three forms of syntax: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
105 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
106 for i in dataset.minibatches(None,**kwargs): ... |
16 | 107 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
108 for i in dataset.minibatches([f1, f2, f3],**kwargs): ... |
16 | 109 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
110 for i1, i2, i3 in dataset.minibatches([f1, f2, f3],**kwargs): ... |
16 | 111 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
112 Using the first two syntaxes, "i" will be an indexable object, such as a list, |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
113 tuple, or Example instance. In both cases, i[k] is a list-like container |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
114 of a batch of current examples. In the second case, i[0] is |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
115 list-like container of the f1 field of a batch current examples, i[1] is |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
116 a list-like container of the f2 field, etc. |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
117 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
118 Using the first syntax, all the fields will be returned in "i". |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
119 Beware that some datasets may not support this syntax, if the number |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
120 of fields is infinite (i.e. field values may be computed "on demand"). |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
121 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
122 Using the third syntax, i1, i2, i3 will be list-like containers of the |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
123 f1, f2, and f3 fields of a batch of examples on each loop iteration. |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
124 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
125 PARAMETERS |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
126 - fieldnames (list of any type, default None): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
127 The loop variables i1, i2, i3 (in the example above) should contain the |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
128 f1, f2, and f3 fields of the current batch of examples. If None, the |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
129 derived class can choose a default, e.g. all fields. |
16 | 130 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
131 - minibatch_size (integer, default 1) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
132 On every iteration, the variables i1, i2, i3 will have |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
133 exactly minibatch_size elements. e.g. len(i1) == minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
134 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
135 - n_batches (integer, default None) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
136 The iterator will loop exactly this many times, and then stop. If None, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
137 the derived class can choose a default. If (-1), then the returned |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
138 iterator should support looping indefinitely. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
139 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
140 Note: A list-like container is something like a tuple, list, numpy.ndarray or |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
141 any other object that supports integer indexing and slicing. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
142 |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
143 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
144 raise AbstractFunction() |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
145 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
146 def hasFields(self,*fieldnames): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
147 """ |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
148 Return true if the given field name (or field names, if multiple arguments are |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
149 given) is recognized by the DataSet (i.e. can be used as a field name in one |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
150 of the iterators). |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
151 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
152 raise AbstractFunction() |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
153 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
154 def merge_fields(self,*specifications): |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
155 """ |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
156 Return a new dataset that maps old fields (of self) to new fields (of the returned |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
157 dataset). The minimal syntax that should be supported is the following: |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
158 new_field_specifications = [new_field_spec1, new_field_spec2, ...] |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
159 new_field_spec = ([old_field1, old_field2, ...], new_field) |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
160 In general both old_field and new_field should be strings, but some datasets may also |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
161 support additional indexing schemes within each field (e.g. column slice |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
162 of a matrix-like field). |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
163 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
164 raise AbstractFunction() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
165 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
166 def merge_field_values(self,*field_value_pairs): |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
167 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
168 Return the value that corresponds to merging the values of several fields, |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
169 given as arguments (field_name, field_value) pairs with self.hasField(field_name). |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
170 This may be used by implementations of merge_fields. |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
171 Raise a ValueError if the operation is not possible. |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
172 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
173 fieldnames,fieldvalues = zip(*field_value_pairs) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
174 raise ValueError("Unable to merge values of these fields:"+repr(fieldnames)) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
175 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
176 def examples2minibatch(self,examples): |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
177 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
178 Combine a list of Examples into a minibatch. A minibatch is an Example whose fields |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
179 are iterable over the examples of the minibatch. |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
180 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
181 raise AbstractFunction() |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
182 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
183 def rename(self,rename_dict): |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
184 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
185 Return a new dataset that renames fields, using a dictionnary that maps old field |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
186 names to new field names. The only fields visible by the returned dataset are those |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
187 whose names are keys of the rename_dict. |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
188 """ |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
189 self_class = self.__class__ |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
190 class SelfRenamingDataSet(RenamingDataSet,self_class): |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
191 pass |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
192 self.__class__ = SelfRenamingDataSet |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
193 # set the rename_dict and src fields |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
194 SelfRenamingDataSet.__init__(self,self,rename_dict) |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
195 return self |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
196 |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
197 def applyFunction(self,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
198 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
199 Return a dataset that contains as fields the results of applying |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
200 the given function (example-wise) to the specified input_fields. The |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
201 function should return a sequence whose elements will be stored in |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
202 fields whose names are given in the output_fields list. If copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
203 is True then the resulting dataset will also contain the fields of self. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
204 If accept_minibatches, then the function may be called |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
205 with minibatches as arguments (what is returned by the minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
206 iterator). In any case, the computations may be delayed until the examples |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
207 of the resulting dataset are requested. If cache is True, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
208 once the output fields for some examples have been computed, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
209 are cached (to avoid recomputation if the same examples are again |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
210 requested). |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
211 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
212 return ApplyFunctionDataSet(function, input_fields, output_fields, copy_inputs, accept_minibatches, cache) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
213 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
214 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
215 class FiniteLengthDataSet(DataSet): |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
216 """ |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
217 Virtual interface for datasets that have a finite length (number of examples), |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
218 and thus recognize a len(dataset) call. |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
219 """ |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
220 def __init__(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
221 DataSet.__init__(self) |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
222 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
223 def __len__(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
224 """len(dataset) returns the number of examples in the dataset.""" |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
225 raise AbstractFunction() |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
226 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
227 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
228 class SliceableDataSet(DataSet): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
229 """ |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
230 Virtual interface, a subclass of DataSet for datasets which are sliceable |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
231 and whose individual elements can be accessed, generally respecting the |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
232 python semantics for [spec], where spec is either a non-negative integer |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
233 (for selecting one example), or a python slice (for selecting a sub-dataset |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
234 comprising the specified examples). This is useful for obtaining |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
235 sub-datasets, e.g. for splitting a dataset into training and test sets. |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
236 """ |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
237 def __init__(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
238 DataSet.__init__(self) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
239 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
240 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
241 fieldnames = DataSet.minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
242 minibatch_size = DataSet.minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
243 n_batches = DataSet.minibatches_n_batches): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
244 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
245 If the n_batches is empty, we want to see all the examples possible |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
246 for the given minibatch_size (possibly missing a few at the end of the dataset). |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
247 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
248 # substitute the defaults: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
249 if n_batches is None: n_batches = len(self) / minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
250 return DataSet.Iterator(self, fieldnames, minibatch_size, n_batches) |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
251 |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
252 def __getitem__(self,i): |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
253 """dataset[i] returns the (i+1)-th example of the dataset.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
254 raise AbstractFunction() |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
255 |
2
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
256 def __getslice__(self,*slice_args): |
3fddb1c8f955
Rewrote DataSet interface and created FiniteDataSet interface.
bengioy@bengiomac.local
parents:
1
diff
changeset
|
257 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
258 raise AbstractFunction() |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
259 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
260 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
261 class FiniteWidthDataSet(DataSet): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
262 """ |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
263 Virtual interface for datasets that have a finite width (number of fields), |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
264 and thus return a list of fieldNames. |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
265 """ |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
266 def __init__(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
267 DataSet.__init__(self) |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
268 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
269 def hasFields(self,*fields): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
270 has_fields=True |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
271 fieldnames = self.fieldNames() |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
272 for name in fields: |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
273 if name not in fieldnames: |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
274 has_fields=False |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
275 return has_fields |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
276 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
277 def fieldNames(self): |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
278 """Return the list of field names that are supported by the iterators, |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
279 and for which hasFields(fieldname) would return True.""" |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
280 raise AbstractFunction() |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
281 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
282 |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
283 class RenamingDataSet(FiniteWidthDataSet): |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
284 """A DataSet that wraps another one, and makes it look like the field names |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
285 are different |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
286 |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
287 Renaming is done by a dictionary that maps new names to the old ones used in |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
288 self.src. |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
289 """ |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
290 def __init__(self, src, rename_dct): |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
291 DataSet.__init__(self) |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
292 self.src = src |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
293 self.rename_dct = copy.copy(rename_dct) |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
294 |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
295 def fieldNames(self): |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
296 return self.rename_dct.keys() |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
297 |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
298 def minibatches(self, |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
299 fieldnames = DataSet.minibatches_fieldnames, |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
300 minibatch_size = DataSet.minibatches_minibatch_size, |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
301 n_batches = DataSet.minibatches_n_batches): |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
302 dct = self.rename_dct |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
303 new_fieldnames = [dct.get(f, f) for f in fieldnames] |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
304 return self.src.minibatches(new_fieldnames, minibatches_size, n_batches) |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
305 |
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
306 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
307 # we may want ArrayDataSet defined in another python file |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
308 |
4 | 309 import numpy |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
310 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
311 def as_array_dataset(dataset): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
312 # Generally datasets can be efficient by making data fields overlap, but |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
313 # this function doesn't know which fields overlap. So, it should check if |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
314 # dataset supports an as_array_dataset member function, and return that if |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
315 # possible. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
316 if hasattr(dataset, 'as_array_dataset'): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
317 return dataset.as_array_dataset() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
318 |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
319 raise NotImplementedError |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
320 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
321 # Make ONE big minibatch with all the examples, to separate the fields. |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
322 n_examples = len(dataset) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
323 batch = dataset.minibatches( minibatch_size = len(dataset)).next() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
324 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
325 # Each field of the underlying dataset must be convertible to a numpy array of the same type |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
326 # currently just double, but should use the smallest compatible dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
327 n_fields = len(batch) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
328 fieldnames = batch.fields.keys() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
329 total_width = 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
330 type = None |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
331 fields = LookupList() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
332 for i in xrange(n_fields): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
333 field = array(batch[i]) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
334 assert field.shape[0]==n_examples |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
335 width = field.shape[1] |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
336 start=total_width |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
337 total_width += width |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
338 fields[fieldnames[i]]=slice(start,total_width,1) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
339 # many complicated things remain to be done: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
340 # - find common dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
341 # - decide what to do with extra dimensions if not the same in all fields |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
342 # - try to see if we can avoid the copy? |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
343 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
344 class ArrayDataSet(FiniteLengthDataSet,FiniteWidthDataSet,SliceableDataSet): |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
345 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
346 An ArrayDataSet behaves like a numpy array but adds the notion of named fields |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
347 from DataSet (and the ability to view the values of multiple fields as an 'Example'). |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
348 It is a fixed-length and fixed-width dataset |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
349 in which each element is a fixed dimension numpy array or a number, hence the whole |
9
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
350 dataset corresponds to a numpy array. Fields |
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
351 must correspond to a slice of array columns. If the dataset has fields, |
6
d5738b79089a
Removed MinibatchIterator and instead made minibatch_size a field of all DataSets,
bengioy@bengiomac.local
parents:
5
diff
changeset
|
352 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array. |
9
de616c423dbd
Improving comments in dataset.py
bengioy@esprit.iro.umontreal.ca
parents:
8
diff
changeset
|
353 Any dataset can also be converted to a numpy array (losing the notion of fields |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
354 by the numpy.array(dataset) call. |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
355 """ |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
356 |
19 | 357 class Iterator(LookupList): |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
358 """An iterator over a finite dataset that implements wrap-around""" |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
359 def __init__(self, dataset, fieldnames, minibatch_size, next_max): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
360 if fieldnames is None: fieldnames = dataset.fieldNames() |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
361 LookupList.__init__(self, fieldnames, [0]*len(fieldnames)) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
362 self.dataset=dataset |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
363 self.minibatch_size=minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
364 self.next_count = 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
365 self.next_max = next_max |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
366 self.current = -self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
367 assert minibatch_size > 0 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
368 if minibatch_size >= len(dataset): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
369 raise NotImplementedError() |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
370 |
19 | 371 def __iter__(self): #makes for loop work |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
372 return self |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
373 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
374 @staticmethod |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
375 def matcat(a, b): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
376 a0, a1 = a.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
377 b0, b1 = b.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
378 assert a1 == b1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
379 assert a.dtype is b.dtype |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
380 rval = numpy.empty( (a0 + b0, a1), dtype=a.dtype) |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
381 rval[:a0,:] = a |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
382 rval[a0:,:] = b |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
383 return rval |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
384 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
385 def next_index(self): |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
386 n_rows = self.dataset.data.shape[0] |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
387 next_i = self.current+self.minibatch_size |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
388 if next_i >= n_rows: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
389 next_i -= n_rows |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
390 return next_i |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
391 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
392 def next(self): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
393 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
394 #check for end-of-loop |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
395 self.next_count += 1 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
396 if self.next_count == self.next_max: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
397 raise StopIteration |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
398 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
399 #determine the first and last elements of the slice we'll return |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
400 n_rows = self.dataset.data.shape[0] |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
401 self.current = self.next_index() |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
402 upper = self.current + self.minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
403 |
19 | 404 data = self.dataset.data |
405 | |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
406 if upper <= n_rows: |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
407 #this is the easy case, we only need once slice |
19 | 408 dataview = data[self.current:upper] |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
409 else: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
410 # the minibatch wraps around the end of the dataset |
19 | 411 dataview = data[self.current:] |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
412 upper -= n_rows |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
413 assert upper > 0 |
19 | 414 dataview = self.matcat(dataview, data[:upper]) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
415 |
19 | 416 self._values = [dataview[:, self.dataset.fields[f]]\ |
417 for f in self._names] | |
418 return self | |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
419 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
420 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
421 def __init__(self, data, fields=None): |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
422 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
423 There are two ways to construct an ArrayDataSet: (1) from an |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
424 existing dataset (which may result in a copy of the data in a numpy array), |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
425 or (2) from a numpy.array (the data argument), along with an optional description |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
426 of the fields (a LookupList of column slices indexed by field names). |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
427 """ |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
428 self.data=data |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
429 self.fields=fields |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
430 rows, cols = data.shape |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
431 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
432 if fields: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
433 for fieldname,fieldslice in fields.items(): |
4 | 434 # make sure fieldslice.start and fieldslice.step are defined |
435 start=fieldslice.start | |
436 step=fieldslice.step | |
437 if not start: | |
438 start=0 | |
439 if not step: | |
440 step=1 | |
441 if not fieldslice.start or not fieldslice.step: | |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
442 fields[fieldname] = fieldslice = slice(start,fieldslice.stop,step) |
4 | 443 # and coherent with the data array |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
444 assert fieldslice.start >= 0 and fieldslice.stop <= cols |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
445 |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
446 def minibatches(self, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
447 fieldnames = DataSet.minibatches_fieldnames, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
448 minibatch_size = DataSet.minibatches_minibatch_size, |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
449 n_batches = DataSet.minibatches_n_batches): |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
450 """ |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
451 If the fieldnames list is None, it means that we want to see ALL the fields. |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
452 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
453 If the n_batches is None, we want to see all the examples possible |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
454 for the given minibatch_size (possibly missing some near the end). |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
455 """ |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
456 # substitute the defaults: |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
457 if n_batches is None: n_batches = len(self) / minibatch_size |
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
458 return ArrayDataSet.Iterator(self, fieldnames, minibatch_size, n_batches) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
459 |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
460 def __getattr__(self,fieldname): |
4 | 461 """ |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
462 Return a numpy array with the content associated with the given field name. |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
463 If this is a one-example dataset, then a row, i.e., numpy array (of one less dimension |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
464 than the dataset itself) is returned. |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
465 """ |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
466 if len(self.data)==1: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
467 return self.data[0,self.fields[fieldname]] |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
468 return self.data[:,self.fields[fieldname]] |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
469 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
470 def __call__(self,*fieldnames): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
471 """Return a sub-dataset containing only the given fieldnames as fields.""" |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
472 min_col=self.data.shape[1] |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
473 max_col=0 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
474 for field_slice in self.fields.values(): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
475 min_col=min(min_col,field_slice.start) |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
476 max_col=max(max_col,field_slice.stop) |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
477 new_fields=LookupList() |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
478 for fieldname,fieldslice in self.fields.items(): |
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
479 new_fields[fieldname]=slice(fieldslice.start-min_col,fieldslice.stop-min_col,fieldslice.step) |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
480 return ArrayDataSet(self.data[:,min_col:max_col],fields=new_fields) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
481 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
482 def fieldNames(self): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
483 """Return the list of field names that are supported by getattr and hasField.""" |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
484 return self.fields.keys() |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
485 |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
486 def __len__(self): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
487 """len(dataset) returns the number of examples in the dataset.""" |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
488 return len(self.data) |
1
2cd82666b9a7
Added statscollector and started writing dataset and learner.
bengioy@esprit.iro.umontreal.ca
parents:
0
diff
changeset
|
489 |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
490 def __getitem__(self,i): |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
491 """ |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
492 dataset[i] returns the (i+1)-th Example of the dataset. If there are no fields |
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
493 the result is just a numpy array (for the i-th row of the dataset data matrix). |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
494 """ |
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
495 if self.fields: |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
496 fieldnames,fieldslices=zip(*self.fields.items()) |
12
ff4e551490f1
Added LookupList type in lookup_list.py and used it to keep order
bengioy@esprit.iro.umontreal.ca
parents:
11
diff
changeset
|
497 return Example(self.fields.keys(),[self.data[i,fieldslice] for fieldslice in self.fields.values()]) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
498 else: |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
499 return self.data[i] |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
500 |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
501 def __getslice__(self,*args): |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
502 """dataset[i:j] returns the subdataset with examples i,i+1,...,j-1.""" |
17
759d17112b23
more comments, looping ArrayDataSet iterator, bugfixes to lookup_list, more tests
bergstrj@iro.umontreal.ca
diff
changeset
|
503 return ArrayDataSet(self.data.__getslice__(*args), fields=self.fields) |
3
378b68d5c4ad
Added first (untested) version of ArrayDataSet
bengioy@bengiomac.local
parents:
2
diff
changeset
|
504 |
8
d1c394486037
Replaced asarray() method by __array__ method which gets called automatically when
bengioy@bengiomac.local
parents:
7
diff
changeset
|
505 def __array__(self): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
506 """Return a view of this dataset which is an numpy.ndarray (i.e. losing |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
507 the identity and name of fields within the dataset). |
15 | 508 |
509 Numpy uses this special function name to retrieve an ndarray view for | |
510 function such as numpy.sum, numpy.dot, numpy.asarray, etc. | |
511 | |
512 If this dataset has no fields, then we simply return self.data, | |
513 otherwise things are complicated. | |
514 - why do we want this behaviour when there are fields? (JB) | |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
515 - for convenience and completeness (but maybe it would make |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
516 more sense to implement this through a 'field-merging' |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
517 dataset). (YB) |
15 | 518 """ |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
519 if not self.fields: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
520 return self.data |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
521 # else, select subsets of columns mapped by the fields |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
522 columns_used = numpy.zeros((self.data.shape[1]),dtype=bool) |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
523 overlapping_fields = False |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
524 n_columns = 0 |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
525 for field_slice in self.fields.values(): |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
526 for c in xrange(field_slice.start,field_slice.stop,field_slice.step): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
527 n_columns += 1 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
528 if columns_used[c]: overlapping_fields=True |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
529 columns_used[c]=True |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
530 # try to figure out if we can map all the slices into one slice: |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
531 mappable_to_one_slice = not overlapping_fields |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
532 if not overlapping_fields: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
533 start=0 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
534 while start<len(columns_used) and not columns_used[start]: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
535 start+=1 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
536 stop=len(columns_used) |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
537 while stop>0 and not columns_used[stop-1]: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
538 stop-=1 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
539 step=0 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
540 i=start |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
541 while i<stop: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
542 j=i+1 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
543 while j<stop and not columns_used[j]: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
544 j+=1 |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
545 if step: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
546 if step!=j-i: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
547 mappable_to_one_slice = False |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
548 break |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
549 else: |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
550 step = j-i |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
551 i=j |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
552 if mappable_to_one_slice: |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
553 return self.data[:,slice(start,stop,step)] |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
554 # else make contiguous copy (copying the overlapping columns) |
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
555 result = numpy.zeros((len(self.data),n_columns)+self.data.shape[2:],self.data.dtype) |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
556 c=0 |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
557 for field_slice in self.fields.values(): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
558 slice_width=(field_slice.stop-field_slice.start)/field_slice.step |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
559 # copy the field here |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
560 result[:,slice(c,c+slice_width)]=self.data[:,field_slice] |
7
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
561 c+=slice_width |
6f8f338686db
Moved iterating counter into a FiniteDataSetIterator to allow embedded iterations and multiple threads iterating at the same time on a dataset.
bengioy@bengiomac.local
parents:
6
diff
changeset
|
562 return result |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
563 |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
564 class ApplyFunctionDataSet(DataSet): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
565 """ |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
566 A dataset that contains as fields the results of applying |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
567 a given function (example-wise) to specified input_fields of a source |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
568 dataset. The function should return a sequence whose elements will be stored in |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
569 fields whose names are given in the output_fields list. If copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
570 is True then the resulting dataset will also contain the fields of the source. |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
571 dataset. If accept_minibatches, then the function expects |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
572 minibatches as arguments (what is returned by the minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
573 iterator). In any case, the computations may be delayed until the examples |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
574 of self are requested. If cache is True, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
575 once the output fields for some examples have been computed, then |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
576 are cached (to avoid recomputation if the same examples are again requested). |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
577 """ |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
578 def __init__(src,function, input_fields, output_fields, copy_inputs=True, accept_minibatches=True, cache=True, compute_now=False): |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
579 DataSet.__init__(self) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
580 self.src=src |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
581 self.function=function |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
582 assert src.hasFields(input_fields) |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
583 self.input_fields=input_fields |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
584 self.output_fields=output_fields |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
585 assert not (copy_inputs and compute_now and not hasattr(src,'fieldNames')) |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
586 self.copy_inputs=copy_inputs |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
587 self.accept_minibatches=accept_minibatches |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
588 self.cache=cache |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
589 self.compute_now=compute_now |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
590 if compute_now: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
591 assert hasattr(src,'__len__') and len(src)>=0 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
592 fieldnames = output_fields |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
593 if copy_inputs: fieldnames = src.fieldNames() + output_fields |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
594 if accept_minibatches: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
595 # make a single minibatch with all the inputs |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
596 inputs = src.minibatches(input_fields,len(src)).next() |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
597 # and apply the function to it, and transpose into a list of examples (field values, actually) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
598 self.cached_examples = zip(*Example(output_fields,function(*inputs))) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
599 else: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
600 # compute a list with one tuple per example, with the function outputs |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
601 self.cached_examples = [ function(input) for input in src.zip(input_fields) ] |
26
672fe4b23032
Fixed dataset errors so that _test_dataset.py works again.
bengioy@grenat.iro.umontreal.ca
parents:
23
diff
changeset
|
602 elif cache: |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
603 # maybe a fixed-size array kind of structure would be more efficient than a list |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
604 # in the case where src is FiniteDataSet. -YB |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
605 self.cached_examples = [] |
11
be128b9127c8
Debugged (to the extent of my tests) the new version of dataset
bengioy@esprit.iro.umontreal.ca
parents:
9
diff
changeset
|
606 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
607 def minibatches(self, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
608 fieldnames = DataSet.minibatches_fieldnames, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
609 minibatch_size = DataSet.minibatches_minibatch_size, |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
610 n_batches = DataSet.minibatches_n_batches): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
611 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
612 class Iterator(LookupList): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
613 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
614 def __init__(self,dataset): |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
615 if fieldnames is None: |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
616 assert hasattr(dataset,"fieldNames") |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
617 fieldnames = dataset.fieldNames() |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
618 self.example_index=0 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
619 LookupList.__init__(self, fieldnames, [0]*len(fieldnames)) |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
620 self.dataset=dataset |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
621 self.src_iterator=self.src.minibatches(list(set.union(set(fieldnames),set(dataset.input_fields))), |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
622 minibatch_size,n_batches) |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
623 self.fieldnames_not_in_input = [] |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
624 if self.copy_inputs: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
625 self.fieldnames_not_in_input = filter(lambda x: not x in dataset.input_fields, fieldnames) |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
626 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
627 def __iter__(self): |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
628 return self |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
629 |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
630 def next_index(self): |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
631 return self.src_iterator.next_index() |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
632 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
633 def next(self): |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
634 example_index = self.src_iterator.next_index() |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
635 src_examples = self.src_iterator.next() |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
636 if self.dataset.copy_inputs: |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
637 function_inputs = [src_examples[field_name] for field_name in self.dataset.input_fields] |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
638 else: |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
639 function_inputs = src_examples |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
640 if self.dataset.cached_examples: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
641 cache_len=len(self.cached_examples) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
642 if example_index<cache_len+minibatch_size: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
643 outputs_list = self.cached_examples[example_index:example_index+minibatch_size] |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
644 # convert the minibatch list of examples |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
645 # into a list of fields each of which iterate over the minibatch |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
646 outputs = zip(*outputs_list) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
647 else: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
648 outputs = self.dataset.function(*function_inputs) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
649 if self.dataset.cache: |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
650 # convert the list of fields, each of which can iterate over the minibatch |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
651 # into a list of examples in the minibatch (each of which is a list of field values) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
652 outputs_list = zip(*outputs) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
653 # copy the outputs_list into the cache |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
654 for i in xrange(cache_len,example_index): |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
655 self.cached_examples.append(None) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
656 self.cached_examples += outputs_list |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
657 else: |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
658 outputs = self.dataset.function(*function_inputs) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
659 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
660 return Example(self.fieldnames_not_in_input+self.dataset.output_fields, |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
661 [src_examples[field_name] for field_name in self.fieldnames_not_in_input]+outputs) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
662 |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
663 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
664 for fieldname in fieldnames: |
22
b6b36f65664f
Created virtual sub-classes of DataSet: {Finite{Length,Width},Sliceable}DataSet,
bengioy@esprit.iro.umontreal.ca
parents:
20
diff
changeset
|
665 assert fieldname in self.output_fields or self.src.hasFields(fieldname) |
20
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
666 return Iterator(self) |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
667 |
266c68cb6136
Minor editions, plus adding untested ApplyFunctionDataset for GradientLearner in the works.
bengioy@bengiomac.local
parents:
19
diff
changeset
|
668 |
23
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
669 def supervised_learning_dataset(src_dataset,input_fields,target_fields,weight_field=None): |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
670 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
671 Wraps an arbitrary DataSet into one for supervised learning tasks by forcing the |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
672 user to define a set of fields as the 'input' field and a set of fields |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
673 as the 'target' field. Optionally, a single weight_field can also be defined. |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
674 """ |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
675 args = ((input_fields,'input'),(output_fields,'target')) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
676 if weight_field: args+=(([weight_field],'weight')) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
677 return src_dataset.rename(*args) |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
678 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
679 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
680 |
526e192b0699
Working on ApplyFunctionDataSet, added constraint that
bengioy@esprit.iro.umontreal.ca
parents:
22
diff
changeset
|
681 |