Mercurial > pylearn
comparison dataset.py @ 78:3499918faa9d
In the middle of designing TLearner
author | bengioy@bengiomac.local |
---|---|
date | Mon, 05 May 2008 09:35:30 -0400 |
parents | 1e2bb5bad636 |
children | 158653a9bc7c |
comparison
equal
deleted
inserted
replaced
77:1e2bb5bad636 | 78:3499918faa9d |
---|---|
86 | 86 |
87 - dataset.<property> returns the value of a property associated with | 87 - dataset.<property> returns the value of a property associated with |
88 the name <property>. The following properties should be supported: | 88 the name <property>. The following properties should be supported: |
89 - 'description': a textual description or name for the dataset | 89 - 'description': a textual description or name for the dataset |
90 - 'fieldtypes': a list of types (one per field) | 90 - 'fieldtypes': a list of types (one per field) |
91 A DataSet may have other attributes that it makes visible to other objects. These are | |
92 used to store information that is not example-wise but global to the dataset. | |
93 The list of names of these attributes is given by the attribute_names() method. | |
91 | 94 |
92 Datasets can be concatenated either vertically (increasing the length) or | 95 Datasets can be concatenated either vertically (increasing the length) or |
93 horizontally (augmenting the set of fields), if they are compatible, using | 96 horizontally (augmenting the set of fields), if they are compatible, using |
94 the following operations (with the same basic semantics as numpy.hstack | 97 the following operations (with the same basic semantics as numpy.hstack |
95 and numpy.vstack): | 98 and numpy.vstack): |
112 | 115 |
113 A dataset can hold arbitrary key-value pairs that may be used to access meta-data | 116 A dataset can hold arbitrary key-value pairs that may be used to access meta-data |
114 or other properties of the dataset or associated with the dataset or the result | 117 or other properties of the dataset or associated with the dataset or the result |
115 of a computation stored in a dataset. These can be accessed through the [key] syntax | 118 of a computation stored in a dataset. These can be accessed through the [key] syntax |
116 when key is a string (or more specifically, neither an integer, a slice, nor a list). | 119 when key is a string (or more specifically, neither an integer, a slice, nor a list). |
117 | 120 |
118 A DataSet sub-class should always redefine the following methods: | 121 A DataSet sub-class should always redefine the following methods: |
119 - __len__ if it is not a stream | 122 - __len__ if it is not a stream |
120 - fieldNames | 123 - fieldNames |
121 - minibatches_nowrap (called by DataSet.minibatches()) | 124 - minibatches_nowrap (called by DataSet.minibatches()) |
122 - valuesHStack | 125 - valuesHStack |
123 - valuesVStack | 126 - valuesVStack |
124 For efficiency of implementation, a sub-class might also want to redefine | 127 For efficiency of implementation, a sub-class might also want to redefine |
125 - hasFields | 128 - hasFields |
126 - __getitem__ may not be feasible with some streams | 129 - __getitem__ may not be feasible with some streams |
127 - __iter__ | 130 - __iter__ |
131 A sub-class should also append attributes to self._attribute_names | |
132 (the default value returned by attributeNames()). | |
133 By convention, attributes not in attributeNames() should have a name | |
134 starting with an underscore. | |
135 @todo enforce/test that convention! | |
128 """ | 136 """ |
129 | 137 |
130 numpy_vstack = lambda fieldname,values: return numpy.vstack(values) | 138 numpy_vstack = lambda fieldname,values: return numpy.vstack(values) |
131 numpy_hstack = lambda fieldnames,values: return numpy.hstack(values) | 139 numpy_hstack = lambda fieldnames,values: return numpy.hstack(values) |
132 | 140 |
134 if description is None: | 142 if description is None: |
135 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)" | 143 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)" |
136 description = type(self).__name__ + " ( " + join([x.__name__ for x in type(self).__bases__]) + " )" | 144 description = type(self).__name__ + " ( " + join([x.__name__ for x in type(self).__bases__]) + " )" |
137 self.description=description | 145 self.description=description |
138 self.fieldtypes=fieldtypes | 146 self.fieldtypes=fieldtypes |
147 self._attribute_names = ["description"] | |
148 if fieldtypes: | |
149 self._attribute_names.append("fieldtypes") | |
150 | |
151 def attributeNames(self): return self._attribute_names | |
152 | |
153 def setAttributes(self,attribute_names,attribute_values): | |
154 for name,value in zip(attribute_names,attribute_values): | |
155 self.__setattr__(name,value) | |
139 | 156 |
140 class MinibatchToSingleExampleIterator(object): | 157 class MinibatchToSingleExampleIterator(object): |
141 """ | 158 """ |
142 Converts the result of minibatch iterator with minibatch_size==1 into | 159 Converts the result of minibatch iterator with minibatch_size==1 into |
143 single-example values in the result. Therefore the result of | 160 single-example values in the result. Therefore the result of |