comparison dataset.py @ 78:3499918faa9d

In the middle of designing TLearner
author bengioy@bengiomac.local
date Mon, 05 May 2008 09:35:30 -0400
parents 1e2bb5bad636
children 158653a9bc7c
comparison
equal deleted inserted replaced
77:1e2bb5bad636 78:3499918faa9d
86 86
87 - dataset.<property> returns the value of a property associated with 87 - dataset.<property> returns the value of a property associated with
88 the name <property>. The following properties should be supported: 88 the name <property>. The following properties should be supported:
89 - 'description': a textual description or name for the dataset 89 - 'description': a textual description or name for the dataset
90 - 'fieldtypes': a list of types (one per field) 90 - 'fieldtypes': a list of types (one per field)
91 A DataSet may have other attributes that it makes visible to other objects. These are
92 used to store information that is not example-wise but global to the dataset.
93 The list of names of these attributes is given by the attribute_names() method.
91 94
92 Datasets can be concatenated either vertically (increasing the length) or 95 Datasets can be concatenated either vertically (increasing the length) or
93 horizontally (augmenting the set of fields), if they are compatible, using 96 horizontally (augmenting the set of fields), if they are compatible, using
94 the following operations (with the same basic semantics as numpy.hstack 97 the following operations (with the same basic semantics as numpy.hstack
95 and numpy.vstack): 98 and numpy.vstack):
112 115
113 A dataset can hold arbitrary key-value pairs that may be used to access meta-data 116 A dataset can hold arbitrary key-value pairs that may be used to access meta-data
114 or other properties of the dataset or associated with the dataset or the result 117 or other properties of the dataset or associated with the dataset or the result
115 of a computation stored in a dataset. These can be accessed through the [key] syntax 118 of a computation stored in a dataset. These can be accessed through the [key] syntax
116 when key is a string (or more specifically, neither an integer, a slice, nor a list). 119 when key is a string (or more specifically, neither an integer, a slice, nor a list).
117 120
118 A DataSet sub-class should always redefine the following methods: 121 A DataSet sub-class should always redefine the following methods:
119 - __len__ if it is not a stream 122 - __len__ if it is not a stream
120 - fieldNames 123 - fieldNames
121 - minibatches_nowrap (called by DataSet.minibatches()) 124 - minibatches_nowrap (called by DataSet.minibatches())
122 - valuesHStack 125 - valuesHStack
123 - valuesVStack 126 - valuesVStack
124 For efficiency of implementation, a sub-class might also want to redefine 127 For efficiency of implementation, a sub-class might also want to redefine
125 - hasFields 128 - hasFields
126 - __getitem__ may not be feasible with some streams 129 - __getitem__ may not be feasible with some streams
127 - __iter__ 130 - __iter__
131 A sub-class should also append attributes to self._attribute_names
132 (the default value returned by attributeNames()).
133 By convention, attributes not in attributeNames() should have a name
134 starting with an underscore.
135 @todo enforce/test that convention!
128 """ 136 """
129 137
130 numpy_vstack = lambda fieldname,values: return numpy.vstack(values) 138 numpy_vstack = lambda fieldname,values: return numpy.vstack(values)
131 numpy_hstack = lambda fieldnames,values: return numpy.hstack(values) 139 numpy_hstack = lambda fieldnames,values: return numpy.hstack(values)
132 140
134 if description is None: 142 if description is None:
135 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)" 143 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)"
136 description = type(self).__name__ + " ( " + join([x.__name__ for x in type(self).__bases__]) + " )" 144 description = type(self).__name__ + " ( " + join([x.__name__ for x in type(self).__bases__]) + " )"
137 self.description=description 145 self.description=description
138 self.fieldtypes=fieldtypes 146 self.fieldtypes=fieldtypes
147 self._attribute_names = ["description"]
148 if fieldtypes:
149 self._attribute_names.append("fieldtypes")
150
151 def attributeNames(self): return self._attribute_names
152
153 def setAttributes(self,attribute_names,attribute_values):
154 for name,value in zip(attribute_names,attribute_values):
155 self.__setattr__(name,value)
139 156
140 class MinibatchToSingleExampleIterator(object): 157 class MinibatchToSingleExampleIterator(object):
141 """ 158 """
142 Converts the result of minibatch iterator with minibatch_size==1 into 159 Converts the result of minibatch iterator with minibatch_size==1 into
143 single-example values in the result. Therefore the result of 160 single-example values in the result. Therefore the result of