Mercurial > pylearn
comparison dataset.py @ 9:de616c423dbd
Improving comments in dataset.py
author | bengioy@esprit.iro.umontreal.ca |
---|---|
date | Mon, 24 Mar 2008 16:52:47 -0400 |
parents | d1c394486037 |
children | be128b9127c8 88168361a5ab |
comparison
equal
deleted
inserted
replaced
8:d1c394486037 | 9:de616c423dbd |
---|---|
8 Datasets with fixed and known length are FiniteDataSet, a subclass of DataSet. | 8 Datasets with fixed and known length are FiniteDataSet, a subclass of DataSet. |
9 Examples and datasets optionally have named fields. | 9 Examples and datasets optionally have named fields. |
10 One can obtain a sub-dataset by taking dataset.field or dataset(field1,field2,field3,...). | 10 One can obtain a sub-dataset by taking dataset.field or dataset(field1,field2,field3,...). |
11 Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. | 11 Fields are not mutually exclusive, i.e. two fields can overlap in their actual content. |
12 The content of a field can be of any type, but often will be a numpy array. | 12 The content of a field can be of any type, but often will be a numpy array. |
13 The minibatch_size field, if different than 1, means that the iterator (next() method) | 13 The minibatch_size attribute, if different than 1, means that the iterator (next() method) |
14 returns not a single example but an array of length minibatch_size, i.e., an indexable | 14 returns not a single example but an array of length minibatch_size, i.e., an indexable |
15 object. | 15 object with minibatch_size examples in it. |
16 """ | 16 """ |
17 | 17 |
18 def __init__(self,minibatch_size=1): | 18 def __init__(self,minibatch_size=1): |
19 assert minibatch_size>0 | 19 assert minibatch_size>0 |
20 self.minibatch_size=minibatch_size | 20 self.minibatch_size=minibatch_size |
23 """ | 23 """ |
24 Return an iterator, whose next() method returns the next example or the next | 24 Return an iterator, whose next() method returns the next example or the next |
25 minibatch in the dataset. A minibatch (of length > 1) should be something one | 25 minibatch in the dataset. A minibatch (of length > 1) should be something one |
26 can iterate on again in order to obtain the individual examples. If the dataset | 26 can iterate on again in order to obtain the individual examples. If the dataset |
27 has fields, then the example or the minibatch must have the same fields | 27 has fields, then the example or the minibatch must have the same fields |
28 (typically this is implemented by returning another (small) dataset, when | 28 (typically this is implemented by returning another smaller dataset, when |
29 there are fields). | 29 there are fields). |
30 """ | 30 """ |
31 raise NotImplementedError | 31 raise NotImplementedError |
32 | 32 |
33 def __getattr__(self,fieldname): | 33 def __getattr__(self,fieldname): |
100 | 100 |
101 import numpy | 101 import numpy |
102 | 102 |
103 class ArrayDataSet(FiniteDataSet): | 103 class ArrayDataSet(FiniteDataSet): |
104 """ | 104 """ |
105 A fixed-length and fixed-width dataset in which each element is a numpy array | 105 An ArrayDataSet behaves like a numpy array but adds the notion of fields |
106 or a number, hence the whole dataset corresponds to a numpy array. Fields | 106 and minibatch_size from DataSet. It is a fixed-length and fixed-width dataset |
107 must correspond to a slice of columns. If the dataset has fields, | 107 in which each element is a numpy array or a number, hence the whole |
108 dataset corresponds to a numpy array. Fields | |
109 must correspond to a slice of array columns. If the dataset has fields, | |
108 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array. | 110 each 'example' is just a one-row ArrayDataSet, otherwise it is a numpy array. |
109 Any dataset can also be converted to a numpy array (losing the notion of fields) | 111 Any dataset can also be converted to a numpy array (losing the notion of fields |
110 by the numpy.array(dataset) call. | 112 and of minibatch_size) by the numpy.array(dataset) call. |
111 """ | 113 """ |
112 | 114 |
113 def __init__(self,dataset=None,data=None,fields={},minibatch_size=1): | 115 def __init__(self,dataset=None,data=None,fields={},minibatch_size=1): |
114 """ | 116 """ |
115 Construct an ArrayDataSet, either from a DataSet, or from | 117 There are two ways to construct an ArrayDataSet: (1) from an |
116 a numpy array plus an optional specification of fields (by | 118 existing dataset (which may result in a copy of the data in a numpy array), |
117 a dictionary of column slices indexed by field names). | 119 or (2) from a numpy.array (the data argument), along with an optional description |
120 of the fields (dictionary of column slices indexed by field names). | |
118 """ | 121 """ |
119 FiniteDataSet.__init__(self,minibatch_size) | 122 FiniteDataSet.__init__(self,minibatch_size) |
120 if dataset!=None: | 123 if dataset!=None: |
121 assert data==None and fields=={} | 124 assert data==None and fields=={} |
122 # convert dataset to an ArrayDataSet | 125 # convert dataset to an ArrayDataSet |