pylearn: dataset.py comparison

comparison dataset.py @ 57:1aabd2e2bb5f

Added empty classes with doc: CachedDataSet and ApplyFunctionDataSet

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Tue, 29 Apr 2008 17:45:16 -0400
parents	1729ad44f175
children	9165d86855ab

comparison

equal deleted inserted replaced

-:1729ad44f175
+:1aabd2e2bb5f
 * dataset[i] returns an Example.
 * dataset[[i1,i2,...in]] returns a dataset with examples i1,i2,...in.
-* dataset['key'] returns a property associated with the given 'key' string.
+* dataset[fieldname] an iterable over the values of the field fieldname across
-If 'key' is a fieldname, then the VStacked field values (iterable over
+the dataset (the iterable is obtained by default by calling valuesVStack
-field values) for that field is returned. Other keys may be supported
+over the values for individual examples).
-by different dataset subclasses. The following key names are should be supported:
+* dataset.<property> returns the value of a property associated with
+the name <property>. The following properties should be supported:
 - 'description': a textual description or name for the dataset
-- '<fieldname>.type': a type name or value for a given <fieldname>
+- 'fieldtypes': a list of types (one per field)
 Datasets can be concatenated either vertically (increasing the length) or
 horizontally (augmenting the set of fields), if they are compatible, using
 the following operations (with the same basic semantics as numpy.hstack
 and numpy.vstack):
 * hasFields
 * __getitem__ may not be feasible with some streams
 * __iter__
 """
-def __init__(self,description=None,field_types=None):
+def __init__(self,description=None,fieldtypes=None):
 if description is None:
 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)"
 description = type(self).__name__ + " ( " + join([x.__name__ for x in type(self).__bases__]) + " )"
 self.description=description
-self.field_types=field_types
+self.fieldtypes=field_types
 class MinibatchToSingleExampleIterator(object):
 """
 Converts the result of minibatch iterator with minibatch_size==1 into
 single-example values in the result. Therefore the result of
 self.minibatch._values = [sub_data[:,self.dataset.fields_columns[f]] for f in self.minibatch._names]
 self.current+=self.minibatch_size
 return self.minibatch
 return ArrayDataSetIterator(self,fieldnames,minibatch_size,n_batches,offset)
+class CachedDataSet(DataSet):
+"""
+Wrap a dataset whose values are computationally expensive to obtain
+(e.g. because they involve some computation, or disk access),
+so that repeated accesses to the same example are done cheaply,
+by caching every example value that has been accessed at least once.
+Optionally, for finite-length dataset, all the values can be computed
+(and cached) upon construction of the CachedDataSet, rather at the
+first access.
+"""
+class ApplyFunctionDataSet(DataSet):
+"""
+A dataset that contains as fields the results of applying a given function
+example-wise or minibatch-wise to all the fields of an input dataset.
+The output of the function should be an iterable (e.g. a list or a LookupList)
+over the resulting values. In minibatch mode, the function is expected
+to work on minibatches (takes a minibatch in input and returns a minibatch
+in output).
+The function is applied each time an example or a minibatch is accessed.
+To avoid re-doing computation, wrap this dataset inside a CachedDataSet.
+"""
 def supervised_learning_dataset(src_dataset,input_fields,target_fields,weight_field=None):
 """
 Wraps an arbitrary DataSet into one for supervised learning tasks by forcing the
 user to define a set of fields as the 'input' field and a set of fields
 as the 'target' field. Optionally, a single weight_field can also be defined.

Mercurial > pylearn

comparison dataset.py @ 57:1aabd2e2bb5f