pylearn: dataset.py comparison

comparison dataset.py @ 46:c5b07e87b0cb

comments modif made by Yoshua

author	Frederic Bastien <bastienf@iro.umontreal.ca>
date	Tue, 29 Apr 2008 12:37:11 -0400
parents	a5c70dc42972
children	b6730f9a336d ea7d8bc38b34

comparison

equal deleted inserted replaced

-:a5c70dc42972
+:c5b07e87b0cb
 * for example in dataset([field1, field2,field3, ...]):
 * for val1,val2,val3 in dataset([field1, field2,field3]):
 * for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N):
 * for mini1,mini2,mini3 in dataset.minibatches([field1, field2, ...],minibatch_size=N):
 * for example in dataset:
+print example['x']
+* for x,y,z in dataset:
 Each of these is documented below. All of these iterators are expected
 to provide, in addition to the usual 'next()' method, a 'next_index()' method
 which returns a non-negative integer pointing to the position of the next
 example that will be returned by 'next()' (or of the first example in the
 next minibatch returned). This is important because these iterators
 can wrap around the dataset in order to do multiple passes through it,
 in possibly unregular ways if the minibatch size is not a divisor of the
 dataset length.
 To iterate over fields, one can do
-* for fields in dataset.fields()
+* for field in dataset.fields():
+for field_value in field: # iterate over the values associated to that field for all the dataset examples
 * for fields in dataset(field1,field2,...).fields() to select a subset of fields
 * for fields in dataset.fields(field1,field2,...) to select a subset of fields
 and each of these fields is iterable over the examples:
 * for field_examples in dataset.fields():
 for example_value in field_examples:
 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
 Note: The content of a field can be of any type. Field values can also be 'missing'
 (e.g. to handle semi-supervised learning), and in the case of numeric (numpy array)
 fields (i.e. an ArrayFieldsDataSet), NaN plays the role of a missing value.
+What about non-numeric values? None.
 Dataset elements can be indexed and sub-datasets (with a subset
 of examples) can be extracted. These operations are not supported
 by default in the case of streams.
 creates a new dataset that concatenates the examples from the argument datasets
 (and whose length is the sum of the length of the argument datasets). This only
 works if they all have the same fields.
 According to the same logic, and viewing a DataSetFields object associated to
-a DataSet as a kind of transpose of it, fields1 + fields2 concatenates fields of
+a DataSet as a kind of transpose of it, fields1 & fields2 concatenates fields of
 a DataSetFields fields1 and fields2, and fields1 | fields2 concatenates their
 examples.
 A dataset can hold arbitrary key-value pairs that may be used to access meta-data
 or other properties of the dataset or associated with the dataset or the result

Mercurial > pylearn

comparison dataset.py @ 46:c5b07e87b0cb