Mercurial > pylearn
comparison dataset.py @ 241:ddb88a8e9fd2
If I understand properly, the length of an unbounded stream is sys.maxint
author | delallea@opale.iro.umontreal.ca |
---|---|
date | Fri, 30 May 2008 10:14:46 -0400 |
parents | 38beb81f4e8b |
children | ef70a665aaaf |
comparison
equal
deleted
inserted
replaced
232:c047238e5b3f | 241:ddb88a8e9fd2 |
---|---|
45 A DataSet can be seen as a generalization of a matrix, meant to be used in conjunction | 45 A DataSet can be seen as a generalization of a matrix, meant to be used in conjunction |
46 with learning algorithms (for training and testing them): rows/records are called examples, and | 46 with learning algorithms (for training and testing them): rows/records are called examples, and |
47 columns/attributes are called fields. The field value for a particular example can be an arbitrary | 47 columns/attributes are called fields. The field value for a particular example can be an arbitrary |
48 python object, which depends on the particular dataset. | 48 python object, which depends on the particular dataset. |
49 | 49 |
50 We call a DataSet a 'stream' when its length is unbounded (otherwise its __len__ method | 50 We call a DataSet a 'stream' when its length is unbounded (in which case its __len__ method |
51 should return sys.maxint). | 51 should return sys.maxint). |
52 | 52 |
53 A DataSet is a generator of iterators; these iterators can run through the | 53 A DataSet is a generator of iterators; these iterators can run through the |
54 examples or the fields in a variety of ways. A DataSet need not necessarily have a finite | 54 examples or the fields in a variety of ways. A DataSet need not necessarily have a finite |
55 or known length, so this class can be used to interface to a 'stream' which | 55 or known length, so this class can be used to interface to a 'stream' which |
56 feeds on-line learning (however, as noted below, some operations are not | 56 feeds on-line learning (however, as noted below, some operations are not |
57 feasible or not recommanded on streams). | 57 feasible or not recommended on streams). |
58 | 58 |
59 To iterate over examples, there are several possibilities: | 59 To iterate over examples, there are several possibilities: |
60 - for example in dataset: | 60 - for example in dataset: |
61 - for val1,val2,... in dataset: | 61 - for val1,val2,... in dataset: |
62 - for example in dataset(field1, field2,field3, ...): | 62 - for example in dataset(field1, field2,field3, ...): |
79 - for field in dataset.fields(field1,field2,...) to select a subset of fields | 79 - for field in dataset.fields(field1,field2,...) to select a subset of fields |
80 and each of these fields is iterable over the examples: | 80 and each of these fields is iterable over the examples: |
81 - for field_examples in dataset.fields(): | 81 - for field_examples in dataset.fields(): |
82 for example_value in field_examples: | 82 for example_value in field_examples: |
83 ... | 83 ... |
84 but when the dataset is a stream (unbounded length), it is not recommanded to do | 84 but when the dataset is a stream (unbounded length), it is not recommended to do |
85 such things because the underlying dataset may refuse to access the different fields in | 85 such things because the underlying dataset may refuse to access the different fields in |
86 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. | 86 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. |
87 The result of fields() is a L{DataSetFields} object, which iterates over fields, | 87 The result of fields() is a L{DataSetFields} object, which iterates over fields, |
88 and whose elements are iterable over examples. A DataSetFields object can | 88 and whose elements are iterable over examples. A DataSetFields object can |
89 be turned back into a DataSet with its examples() method:: | 89 be turned back into a DataSet with its examples() method:: |
597 * for fields in dataset.fields(field1,field2,...) to select a subset of fields | 597 * for fields in dataset.fields(field1,field2,...) to select a subset of fields |
598 and each of these fields is iterable over the examples: | 598 and each of these fields is iterable over the examples: |
599 * for field_examples in dataset.fields(): | 599 * for field_examples in dataset.fields(): |
600 for example_value in field_examples: | 600 for example_value in field_examples: |
601 ... | 601 ... |
602 but when the dataset is a stream (unbounded length), it is not recommanded to do | 602 but when the dataset is a stream (unbounded length), it is not recommended to do |
603 such things because the underlying dataset may refuse to access the different fields in | 603 such things because the underlying dataset may refuse to access the different fields in |
604 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. | 604 an unsynchronized ways. Hence the fields() method is illegal for streams, by default. |
605 The result of fields() is a DataSetFields object, which iterates over fields, | 605 The result of fields() is a DataSetFields object, which iterates over fields, |
606 and whose elements are iterable over examples. A DataSetFields object can | 606 and whose elements are iterable over examples. A DataSetFields object can |
607 be turned back into a DataSet with its examples() method: | 607 be turned back into a DataSet with its examples() method: |