comparison dataset.py @ 241:ddb88a8e9fd2

If I understand properly, the length of an unbounded stream is sys.maxint
author delallea@opale.iro.umontreal.ca
date Fri, 30 May 2008 10:14:46 -0400
parents 38beb81f4e8b
children ef70a665aaaf
comparison
equal deleted inserted replaced
232:c047238e5b3f 241:ddb88a8e9fd2
45 A DataSet can be seen as a generalization of a matrix, meant to be used in conjunction 45 A DataSet can be seen as a generalization of a matrix, meant to be used in conjunction
46 with learning algorithms (for training and testing them): rows/records are called examples, and 46 with learning algorithms (for training and testing them): rows/records are called examples, and
47 columns/attributes are called fields. The field value for a particular example can be an arbitrary 47 columns/attributes are called fields. The field value for a particular example can be an arbitrary
48 python object, which depends on the particular dataset. 48 python object, which depends on the particular dataset.
49 49
50 We call a DataSet a 'stream' when its length is unbounded (otherwise its __len__ method 50 We call a DataSet a 'stream' when its length is unbounded (in which case its __len__ method
51 should return sys.maxint). 51 should return sys.maxint).
52 52
53 A DataSet is a generator of iterators; these iterators can run through the 53 A DataSet is a generator of iterators; these iterators can run through the
54 examples or the fields in a variety of ways. A DataSet need not necessarily have a finite 54 examples or the fields in a variety of ways. A DataSet need not necessarily have a finite
55 or known length, so this class can be used to interface to a 'stream' which 55 or known length, so this class can be used to interface to a 'stream' which
56 feeds on-line learning (however, as noted below, some operations are not 56 feeds on-line learning (however, as noted below, some operations are not
57 feasible or not recommanded on streams). 57 feasible or not recommended on streams).
58 58
59 To iterate over examples, there are several possibilities: 59 To iterate over examples, there are several possibilities:
60 - for example in dataset: 60 - for example in dataset:
61 - for val1,val2,... in dataset: 61 - for val1,val2,... in dataset:
62 - for example in dataset(field1, field2,field3, ...): 62 - for example in dataset(field1, field2,field3, ...):
79 - for field in dataset.fields(field1,field2,...) to select a subset of fields 79 - for field in dataset.fields(field1,field2,...) to select a subset of fields
80 and each of these fields is iterable over the examples: 80 and each of these fields is iterable over the examples:
81 - for field_examples in dataset.fields(): 81 - for field_examples in dataset.fields():
82 for example_value in field_examples: 82 for example_value in field_examples:
83 ... 83 ...
84 but when the dataset is a stream (unbounded length), it is not recommanded to do 84 but when the dataset is a stream (unbounded length), it is not recommended to do
85 such things because the underlying dataset may refuse to access the different fields in 85 such things because the underlying dataset may refuse to access the different fields in
86 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. 86 an unsynchronized ways. Hence the fields() method is illegal for streams, by default.
87 The result of fields() is a L{DataSetFields} object, which iterates over fields, 87 The result of fields() is a L{DataSetFields} object, which iterates over fields,
88 and whose elements are iterable over examples. A DataSetFields object can 88 and whose elements are iterable over examples. A DataSetFields object can
89 be turned back into a DataSet with its examples() method:: 89 be turned back into a DataSet with its examples() method::
597 * for fields in dataset.fields(field1,field2,...) to select a subset of fields 597 * for fields in dataset.fields(field1,field2,...) to select a subset of fields
598 and each of these fields is iterable over the examples: 598 and each of these fields is iterable over the examples:
599 * for field_examples in dataset.fields(): 599 * for field_examples in dataset.fields():
600 for example_value in field_examples: 600 for example_value in field_examples:
601 ... 601 ...
602 but when the dataset is a stream (unbounded length), it is not recommanded to do 602 but when the dataset is a stream (unbounded length), it is not recommended to do
603 such things because the underlying dataset may refuse to access the different fields in 603 such things because the underlying dataset may refuse to access the different fields in
604 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. 604 an unsynchronized ways. Hence the fields() method is illegal for streams, by default.
605 The result of fields() is a DataSetFields object, which iterates over fields, 605 The result of fields() is a DataSetFields object, which iterates over fields,
606 and whose elements are iterable over examples. A DataSetFields object can 606 and whose elements are iterable over examples. A DataSetFields object can
607 be turned back into a DataSet with its examples() method: 607 be turned back into a DataSet with its examples() method: