comparison dataset.py @ 269:fdce496c3b56

deprecating __getitem__[fieldname] syntax
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 04 Jun 2008 19:04:40 -0400
parents 3f1cd8897fda
children fa8abc813bd2
comparison
equal deleted inserted replaced
268:3f1cd8897fda 269:fdce496c3b56
107 107
108 - dataset[i] returns an Example. 108 - dataset[i] returns an Example.
109 109
110 - dataset[[i1,i2,...in]] returns a dataset with examples i1,i2,...in. 110 - dataset[[i1,i2,...in]] returns a dataset with examples i1,i2,...in.
111 111
112 - dataset[fieldname] an iterable over the values of the field fieldname across
113 the dataset (the iterable is obtained by default by calling valuesVStack
114 over the values for individual examples).
115
116 - dataset.<property> returns the value of a property associated with 112 - dataset.<property> returns the value of a property associated with
117 the name <property>. The following properties should be supported: 113 the name <property>. The following properties should be supported:
118 - 'description': a textual description or name for the dataset 114 - 'description': a textual description or name for the dataset
119 - 'fieldtypes': a list of types (one per field) 115 - 'fieldtypes': a list of types (one per field)
120 A DataSet may have other attributes that it makes visible to other objects. These are 116 A DataSet may have other attributes that it makes visible to other objects. These are
149 145
150 A DataSet sub-class should always redefine the following methods: 146 A DataSet sub-class should always redefine the following methods:
151 - __len__ if it is not a stream 147 - __len__ if it is not a stream
152 - fieldNames 148 - fieldNames
153 - minibatches_nowrap (called by DataSet.minibatches()) 149 - minibatches_nowrap (called by DataSet.minibatches())
150 For efficiency of implementation, a sub-class might also want to redefine
154 - valuesHStack 151 - valuesHStack
155 - valuesVStack 152 - valuesVStack
156 For efficiency of implementation, a sub-class might also want to redefine
157 - hasFields 153 - hasFields
158 - __getitem__ may not be feasible with some streams 154 - __getitem__ may not be feasible with some streams
159 - __iter__ 155 - __iter__
160 A sub-class should also append attributes to self._attribute_names 156 A sub-class should also append attributes to self._attribute_names
161 (the default value returned by attributeNames()). 157 (the default value returned by attributeNames()).
410 """ 406 """
411 Return a DataSetFields object associated with this dataset. 407 Return a DataSetFields object associated with this dataset.
412 """ 408 """
413 return DataSetFields(self,fieldnames) 409 return DataSetFields(self,fieldnames)
414 410
411 def getitem_key(self, fieldname):
412 """A not-so-well thought-out place to put code that used to be in
413 getitem.
414 """
415 #removing as per discussion June 4. --JSB
416
417 i = fieldname
418 # else check for a fieldname
419 if self.hasFields(i):
420 return self.minibatches(fieldnames=[i],minibatch_size=len(self),n_batches=1,offset=0).next()[0]
421 # else we are trying to access a property of the dataset
422 assert i in self.__dict__ # else it means we are trying to access a non-existing property
423 return self.__dict__[i]
424
415 def __getitem__(self,i): 425 def __getitem__(self,i):
416 """ 426 """
417 dataset[i] returns the (i+1)-th example of the dataset. 427 dataset[i] returns the (i+1)-th example of the dataset.
418 dataset[i:j] returns the subdataset with examples i,i+1,...,j-1. 428 dataset[i:j] returns the subdataset with examples i,i+1,...,j-1.
419 dataset[i:j:s] returns the subdataset with examples i,i+2,i+4...,j-2. 429 dataset[i:j:s] returns the subdataset with examples i,i+2,i+4...,j-2.
458 return MinibatchDataSet( 468 return MinibatchDataSet(
459 Example(self.fieldNames(),[ self.valuesVStack(fieldname,field_values) 469 Example(self.fieldNames(),[ self.valuesVStack(fieldname,field_values)
460 for fieldname,field_values 470 for fieldname,field_values
461 in zip(self.fieldNames(),fields_values)]), 471 in zip(self.fieldNames(),fields_values)]),
462 self.valuesVStack,self.valuesHStack) 472 self.valuesVStack,self.valuesHStack)
463 # else check for a fieldname 473 raise TypeError(i, type(i))
464 if self.hasFields(i):
465 return self.minibatches(fieldnames=[i],minibatch_size=len(self),n_batches=1,offset=0).next()[0]
466 # else we are trying to access a property of the dataset
467 assert i in self.__dict__ # else it means we are trying to access a non-existing property
468 return self.__dict__[i]
469 474
470 def valuesHStack(self,fieldnames,fieldvalues): 475 def valuesHStack(self,fieldnames,fieldvalues):
471 """ 476 """
472 Return a value that corresponds to concatenating (horizontally) several field values. 477 Return a value that corresponds to concatenating (horizontally) several field values.
473 This can be useful to merge some fields. The implementation of this operation is likely 478 This can be useful to merge some fields. The implementation of this operation is likely