pylearn: dataset.py comparison

comparison dataset.py @ 72:2b6656b2ef52

Changed docs slightly

author	Joseph Turian <turian@iro.umontreal.ca>
date	Fri, 02 May 2008 18:36:47 -0400
parents	dde1fb1b63ba
children	69f97aad3faf

comparison

equal deleted inserted replaced

-:5b699b31770a
+:2b6656b2ef52
 or known length, so this class can be used to interface to a 'stream' which
 feeds on-line learning (however, as noted below, some operations are not
 feasible or not recommanded on streams).
 To iterate over examples, there are several possibilities:
-* for example in dataset([field1, field2,field3, ...]):
+- for example in dataset([field1, field2,field3, ...]):
-* for val1,val2,val3 in dataset([field1, field2,field3]):
+- for val1,val2,val3 in dataset([field1, field2,field3]):
-* for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N):
+- for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N):
-* for mini1,mini2,mini3 in dataset.minibatches([field1, field2, field3], minibatch_size=N):
+- for mini1,mini2,mini3 in dataset.minibatches([field1, field2, field3], minibatch_size=N):
-* for example in dataset:
+- for example in dataset::
 print example['x']
-* for x,y,z in dataset:
+- for x,y,z in dataset:
 Each of these is documented below. All of these iterators are expected
 to provide, in addition to the usual 'next()' method, a 'next_index()' method
 which returns a non-negative integer pointing to the position of the next
 example that will be returned by 'next()' (or of the first example in the
 next minibatch returned). This is important because these iterators
 can wrap around the dataset in order to do multiple passes through it,
 in possibly unregular ways if the minibatch size is not a divisor of the
 dataset length.
 To iterate over fields, one can do
-* for field in dataset.fields():
+- for field in dataset.fields():
 for field_value in field: # iterate over the values associated to that field for all the dataset examples
-* for field in dataset(field1,field2,...).fields() to select a subset of fields
+- for field in dataset(field1,field2,...).fields() to select a subset of fields
-* for field in dataset.fields(field1,field2,...) to select a subset of fields
+- for field in dataset.fields(field1,field2,...) to select a subset of fields
 and each of these fields is iterable over the examples:
-* for field_examples in dataset.fields():
+- for field_examples in dataset.fields():
 for example_value in field_examples:
 ...
 but when the dataset is a stream (unbounded length), it is not recommanded to do
 such things because the underlying dataset may refuse to access the different fields in
 an unsynchronized ways. Hence the fields() method is illegal for streams, by default.
 The result of fields() is a DataSetFields object, which iterates over fields,
 and whose elements are iterable over examples. A DataSetFields object can
-be turned back into a DataSet with its examples() method:
+be turned back into a DataSet with its examples() method::
 dataset2 = dataset1.fields().examples()
 and dataset2 should behave exactly like dataset1 (in fact by default dataset2==dataset1).
 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
 Dataset elements can be indexed and sub-datasets (with a subset
 of examples) can be extracted. These operations are not supported
 by default in the case of streams.
-* dataset[:n] returns a dataset with the n first examples.
+- dataset[:n] returns a dataset with the n first examples.
-* dataset[i1:i2:s] returns a dataset with the examples i1,i1+s,...i2-s.
+- dataset[i1:i2:s] returns a dataset with the examples i1,i1+s,...i2-s.
-* dataset[i] returns an Example.
+- dataset[i] returns an Example.
-* dataset[[i1,i2,...in]] returns a dataset with examples i1,i2,...in.
+- dataset[[i1,i2,...in]] returns a dataset with examples i1,i2,...in.
-* dataset[fieldname] an iterable over the values of the field fieldname across
+- dataset[fieldname] an iterable over the values of the field fieldname across
 the dataset (the iterable is obtained by default by calling valuesVStack
 over the values for individual examples).
-* dataset.<property> returns the value of a property associated with
+- dataset.<property> returns the value of a property associated with
 the name <property>. The following properties should be supported:
 - 'description': a textual description or name for the dataset
 - 'fieldtypes': a list of types (one per field)
 Datasets can be concatenated either vertically (increasing the length) or
 horizontally (augmenting the set of fields), if they are compatible, using
 the following operations (with the same basic semantics as numpy.hstack
 and numpy.vstack):
-* dataset1 | dataset2 | dataset3 == dataset.hstack([dataset1,dataset2,dataset3])
+- dataset1 | dataset2 | dataset3 == dataset.hstack([dataset1,dataset2,dataset3])
 creates a new dataset whose list of fields is the concatenation of the list of
 fields of the argument datasets. This only works if they all have the same length.
-* dataset1 & dataset2 & dataset3 == dataset.vstack([dataset1,dataset2,dataset3])
+- dataset1 & dataset2 & dataset3 == dataset.vstack([dataset1,dataset2,dataset3])
 creates a new dataset that concatenates the examples from the argument datasets
 (and whose length is the sum of the length of the argument datasets). This only
 works if they all have the same fields.
 or other properties of the dataset or associated with the dataset or the result
 of a computation stored in a dataset. These can be accessed through the [key] syntax
 when key is a string (or more specifically, neither an integer, a slice, nor a list).
 A DataSet sub-class should always redefine the following methods:
-* __len__ if it is not a stream
+- __len__ if it is not a stream
-* fieldNames
+- fieldNames
-* minibatches_nowrap (called by DataSet.minibatches())
+- minibatches_nowrap (called by DataSet.minibatches())
-* valuesHStack
+- valuesHStack
-* valuesVStack
+- valuesVStack
 For efficiency of implementation, a sub-class might also want to redefine
-* hasFields
+- hasFields
-* __getitem__ may not be feasible with some streams
+- __getitem__ may not be feasible with some streams
-* __iter__
+- __iter__
 """
 def __init__(self,description=None,fieldtypes=None):
 if description is None:
 # by default return "<DataSetType>(<SuperClass1>,<SuperClass2>,...)"

Mercurial > pylearn

comparison dataset.py @ 72:2b6656b2ef52