Mercurial > pylearn
comparison dataset.py @ 46:c5b07e87b0cb
comments modif made by Yoshua
author | Frederic Bastien <bastienf@iro.umontreal.ca> |
---|---|
date | Tue, 29 Apr 2008 12:37:11 -0400 |
parents | a5c70dc42972 |
children | b6730f9a336d ea7d8bc38b34 |
comparison
equal
deleted
inserted
replaced
45:a5c70dc42972 | 46:c5b07e87b0cb |
---|---|
31 * for example in dataset([field1, field2,field3, ...]): | 31 * for example in dataset([field1, field2,field3, ...]): |
32 * for val1,val2,val3 in dataset([field1, field2,field3]): | 32 * for val1,val2,val3 in dataset([field1, field2,field3]): |
33 * for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N): | 33 * for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N): |
34 * for mini1,mini2,mini3 in dataset.minibatches([field1, field2, ...],minibatch_size=N): | 34 * for mini1,mini2,mini3 in dataset.minibatches([field1, field2, ...],minibatch_size=N): |
35 * for example in dataset: | 35 * for example in dataset: |
36 print example['x'] | |
37 * for x,y,z in dataset: | |
36 Each of these is documented below. All of these iterators are expected | 38 Each of these is documented below. All of these iterators are expected |
37 to provide, in addition to the usual 'next()' method, a 'next_index()' method | 39 to provide, in addition to the usual 'next()' method, a 'next_index()' method |
38 which returns a non-negative integer pointing to the position of the next | 40 which returns a non-negative integer pointing to the position of the next |
39 example that will be returned by 'next()' (or of the first example in the | 41 example that will be returned by 'next()' (or of the first example in the |
40 next minibatch returned). This is important because these iterators | 42 next minibatch returned). This is important because these iterators |
41 can wrap around the dataset in order to do multiple passes through it, | 43 can wrap around the dataset in order to do multiple passes through it, |
42 in possibly unregular ways if the minibatch size is not a divisor of the | 44 in possibly unregular ways if the minibatch size is not a divisor of the |
43 dataset length. | 45 dataset length. |
44 | 46 |
45 To iterate over fields, one can do | 47 To iterate over fields, one can do |
46 * for fields in dataset.fields() | 48 * for field in dataset.fields(): |
49 for field_value in field: # iterate over the values associated to that field for all the dataset examples | |
47 * for fields in dataset(field1,field2,...).fields() to select a subset of fields | 50 * for fields in dataset(field1,field2,...).fields() to select a subset of fields |
48 * for fields in dataset.fields(field1,field2,...) to select a subset of fields | 51 * for fields in dataset.fields(field1,field2,...) to select a subset of fields |
49 and each of these fields is iterable over the examples: | 52 and each of these fields is iterable over the examples: |
50 * for field_examples in dataset.fields(): | 53 * for field_examples in dataset.fields(): |
51 for example_value in field_examples: | 54 for example_value in field_examples: |
61 | 64 |
62 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. | 65 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. |
63 | 66 |
64 Note: The content of a field can be of any type. Field values can also be 'missing' | 67 Note: The content of a field can be of any type. Field values can also be 'missing' |
65 (e.g. to handle semi-supervised learning), and in the case of numeric (numpy array) | 68 (e.g. to handle semi-supervised learning), and in the case of numeric (numpy array) |
66 fields (i.e. an ArrayFieldsDataSet), NaN plays the role of a missing value. | 69 fields (i.e. an ArrayFieldsDataSet), NaN plays the role of a missing value. |
70 What about non-numeric values? None. | |
67 | 71 |
68 Dataset elements can be indexed and sub-datasets (with a subset | 72 Dataset elements can be indexed and sub-datasets (with a subset |
69 of examples) can be extracted. These operations are not supported | 73 of examples) can be extracted. These operations are not supported |
70 by default in the case of streams. | 74 by default in the case of streams. |
71 | 75 |
99 creates a new dataset that concatenates the examples from the argument datasets | 103 creates a new dataset that concatenates the examples from the argument datasets |
100 (and whose length is the sum of the length of the argument datasets). This only | 104 (and whose length is the sum of the length of the argument datasets). This only |
101 works if they all have the same fields. | 105 works if they all have the same fields. |
102 | 106 |
103 According to the same logic, and viewing a DataSetFields object associated to | 107 According to the same logic, and viewing a DataSetFields object associated to |
104 a DataSet as a kind of transpose of it, fields1 + fields2 concatenates fields of | 108 a DataSet as a kind of transpose of it, fields1 & fields2 concatenates fields of |
105 a DataSetFields fields1 and fields2, and fields1 | fields2 concatenates their | 109 a DataSetFields fields1 and fields2, and fields1 | fields2 concatenates their |
106 examples. | 110 examples. |
107 | 111 |
108 A dataset can hold arbitrary key-value pairs that may be used to access meta-data | 112 A dataset can hold arbitrary key-value pairs that may be used to access meta-data |
109 or other properties of the dataset or associated with the dataset or the result | 113 or other properties of the dataset or associated with the dataset or the result |