comparison dataset.py @ 46:c5b07e87b0cb

comments modif made by Yoshua
author Frederic Bastien <bastienf@iro.umontreal.ca>
date Tue, 29 Apr 2008 12:37:11 -0400
parents a5c70dc42972
children b6730f9a336d ea7d8bc38b34
comparison
equal deleted inserted replaced
45:a5c70dc42972 46:c5b07e87b0cb
31 * for example in dataset([field1, field2,field3, ...]): 31 * for example in dataset([field1, field2,field3, ...]):
32 * for val1,val2,val3 in dataset([field1, field2,field3]): 32 * for val1,val2,val3 in dataset([field1, field2,field3]):
33 * for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N): 33 * for minibatch in dataset.minibatches([field1, field2, ...],minibatch_size=N):
34 * for mini1,mini2,mini3 in dataset.minibatches([field1, field2, ...],minibatch_size=N): 34 * for mini1,mini2,mini3 in dataset.minibatches([field1, field2, ...],minibatch_size=N):
35 * for example in dataset: 35 * for example in dataset:
36 print example['x']
37 * for x,y,z in dataset:
36 Each of these is documented below. All of these iterators are expected 38 Each of these is documented below. All of these iterators are expected
37 to provide, in addition to the usual 'next()' method, a 'next_index()' method 39 to provide, in addition to the usual 'next()' method, a 'next_index()' method
38 which returns a non-negative integer pointing to the position of the next 40 which returns a non-negative integer pointing to the position of the next
39 example that will be returned by 'next()' (or of the first example in the 41 example that will be returned by 'next()' (or of the first example in the
40 next minibatch returned). This is important because these iterators 42 next minibatch returned). This is important because these iterators
41 can wrap around the dataset in order to do multiple passes through it, 43 can wrap around the dataset in order to do multiple passes through it,
42 in possibly unregular ways if the minibatch size is not a divisor of the 44 in possibly unregular ways if the minibatch size is not a divisor of the
43 dataset length. 45 dataset length.
44 46
45 To iterate over fields, one can do 47 To iterate over fields, one can do
46 * for fields in dataset.fields() 48 * for field in dataset.fields():
49 for field_value in field: # iterate over the values associated to that field for all the dataset examples
47 * for fields in dataset(field1,field2,...).fields() to select a subset of fields 50 * for fields in dataset(field1,field2,...).fields() to select a subset of fields
48 * for fields in dataset.fields(field1,field2,...) to select a subset of fields 51 * for fields in dataset.fields(field1,field2,...) to select a subset of fields
49 and each of these fields is iterable over the examples: 52 and each of these fields is iterable over the examples:
50 * for field_examples in dataset.fields(): 53 * for field_examples in dataset.fields():
51 for example_value in field_examples: 54 for example_value in field_examples:
61 64
62 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. 65 Note: Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
63 66
64 Note: The content of a field can be of any type. Field values can also be 'missing' 67 Note: The content of a field can be of any type. Field values can also be 'missing'
65 (e.g. to handle semi-supervised learning), and in the case of numeric (numpy array) 68 (e.g. to handle semi-supervised learning), and in the case of numeric (numpy array)
66 fields (i.e. an ArrayFieldsDataSet), NaN plays the role of a missing value. 69 fields (i.e. an ArrayFieldsDataSet), NaN plays the role of a missing value.
70 What about non-numeric values? None.
67 71
68 Dataset elements can be indexed and sub-datasets (with a subset 72 Dataset elements can be indexed and sub-datasets (with a subset
69 of examples) can be extracted. These operations are not supported 73 of examples) can be extracted. These operations are not supported
70 by default in the case of streams. 74 by default in the case of streams.
71 75
99 creates a new dataset that concatenates the examples from the argument datasets 103 creates a new dataset that concatenates the examples from the argument datasets
100 (and whose length is the sum of the length of the argument datasets). This only 104 (and whose length is the sum of the length of the argument datasets). This only
101 works if they all have the same fields. 105 works if they all have the same fields.
102 106
103 According to the same logic, and viewing a DataSetFields object associated to 107 According to the same logic, and viewing a DataSetFields object associated to
104 a DataSet as a kind of transpose of it, fields1 + fields2 concatenates fields of 108 a DataSet as a kind of transpose of it, fields1 & fields2 concatenates fields of
105 a DataSetFields fields1 and fields2, and fields1 | fields2 concatenates their 109 a DataSetFields fields1 and fields2, and fields1 | fields2 concatenates their
106 examples. 110 examples.
107 111
108 A dataset can hold arbitrary key-value pairs that may be used to access meta-data 112 A dataset can hold arbitrary key-value pairs that may be used to access meta-data
109 or other properties of the dataset or associated with the dataset or the result 113 or other properties of the dataset or associated with the dataset or the result