comparison dataset.py @ 9:de616c423dbd

Improving comments in dataset.py
author bengioy@esprit.iro.umontreal.ca
date Mon, 24 Mar 2008 16:52:47 -0400
parents d1c394486037
children be128b9127c8 88168361a5ab
comparison
equal deleted inserted replaced
8:d1c394486037 9:de616c423dbd
8 Datasets with fixed and known length are FiniteDataSet, a subclass of DataSet. 8 Datasets with fixed and known length are FiniteDataSet, a subclass of DataSet.
9 Examples and datasets optionally have named fields. 9 Examples and datasets optionally have named fields.
10 One can obtain a sub-dataset by taking dataset.field or dataset(field1,field2,field3,...). 10 One can obtain a sub-dataset by taking dataset.field or dataset(field1,field2,field3,...).
11 Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. 11 Fields are not mutually exclusive, i.e. two fields can overlap in their actual content.
12 The content of a field can be of any type, but often will be a numpy array. 12 The content of a field can be of any type, but often will be a numpy array.
13 The minibatch_size field, if different than 1, means that the iterator (next() method) 13 The minibatch_size attribute, if different than 1, means that the iterator (next() method)
14 returns not a single example but an array of length minibatch_size, i.e., an indexable 14 returns not a single example but an array of length minibatch_size, i.e., an indexable
15 object. 15 object with minibatch_size examples in it.
16 """ 16 """
17 17
18 def __init__(self,minibatch_size=1): 18 def __init__(self,minibatch_size=1):
19 assert minibatch_size>0 19 assert minibatch_size>0
20 self.minibatch_size=minibatch_size 20 self.minibatch_size=minibatch_size
23 """ 23 """
24 Return an iterator, whose next() method returns the next example or the next 24 Return an iterator, whose next() method returns the next example or the next
25 minibatch in the dataset. A minibatch (of length > 1) should be something one 25 minibatch in the dataset. A minibatch (of length > 1) should be something one
26 can iterate on again in order to obtain the individual examples. If the dataset 26 can iterate on again in order to obtain the individual examples. If the dataset
27 has fields, then the example or the minibatch must have the same fields 27 has fields, then the example or the minibatch must have the same fields
28 (typically this is implemented by returning another (small) dataset, when 28 (typically this is implemented by returning another smaller dataset, when
29 there are fields). 29 there are fields).
30 """ 30 """
31 raise NotImplementedError 31 raise NotImplementedError
32 32
33 def __getattr__(self,fieldname): 33 def __getattr__(self,fieldname):
100 100
101 import numpy 101 import numpy
102 102
103 class ArrayDataSet(FiniteDataSet): 103 class ArrayDataSet(FiniteDataSet):
104 """ 104 """
105 A fixed-length and fixed-width dataset in which each element is a numpy array 105 An ArrayDataSet behaves like a numpy array but adds the notion of fields
106 or a number, hence the whole dataset corresponds to a numpy array. Fields 106 and minibatch_size from DataSet. It is a fixed-length and fixed-width dataset
107 must correspond to a slice of columns. If the dataset has fields, 107 in which each element is a numpy array or a number, hence the whole
108 dataset corresponds to a numpy array. Fields
109 must correspond to a slice of array columns. If the dataset has fields,
108 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array. 110 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array.
109 Any dataset can also be converted to a numpy array (losing the notion of fields) 111 Any dataset can also be converted to a numpy array (losing the notion of fields
110 by the numpy.array(dataset) call. 112 and of minibatch_size) by the numpy.array(dataset) call.
111 """ 113 """
112 114
113 def __init__(self,dataset=None,data=None,fields={},minibatch_size=1): 115 def __init__(self,dataset=None,data=None,fields={},minibatch_size=1):
114 """ 116 """
115 Construct an ArrayDataSet, either from a DataSet, or from 117 There are two ways to construct an ArrayDataSet: (1) from an
116 a numpy array plus an optional specification of fields (by 118 existing dataset (which may result in a copy of the data in a numpy array),
117 a dictionary of column slices indexed by field names). 119 or (2) from a numpy.array (the data argument), along with an optional description
120 of the fields (dictionary of column slices indexed by field names).
118 """ 121 """
119 FiniteDataSet.__init__(self,minibatch_size) 122 FiniteDataSet.__init__(self,minibatch_size)
120 if dataset!=None: 123 if dataset!=None:
121 assert data==None and fields=={} 124 assert data==None and fields=={}
122 # convert dataset to an ArrayDataSet 125 # convert dataset to an ArrayDataSet