comparison dataset.py @ 1:2cd82666b9a7

Added statscollector and started writing dataset and learner.
author bengioy@esprit.iro.umontreal.ca
date Fri, 14 Mar 2008 11:28:08 -0400
parents 586dcaa4b2df
children 3fddb1c8f955
comparison
equal deleted inserted replaced
0:586dcaa4b2df 1:2cd82666b9a7
1
2
3 class DataSet(object):
4 """Base class for representing a fixed-size or variable-size (online learning)
5 data set. A DataSet is used in a Learner to represent a training set or a
6 validation set. It is an indexed collection of examples. An example
7 is expected to obey the syntax of dictionaries, i.e., it contains named
8 fields that can be accessed via the [fieldname] syntax.
9 If one views a DataSet as a matrix, the [i] operator selects a row while the .fieldname
10 operator selects a named 'field' or column. However, each of the entries in one of
11 these 'columns' can be any python object, not just a number. One can also
12 use the slicing notation to select a subset of example and the getFields
13 method to select a subset of the fields."""
14
15 __init__(self):
16 pass
17
18 size(self):
19 """Return -1 for variable-size DataSets (for on-line learning), and
20 the actual size otherwise"""
21 return 0
22
23 fieldNames(self):
24 """Return the list of field names that are supported by getattr and getFields."""
25 raise NotImplementedError
26
27 __getitem__(self, i):
28 """dataset[i] returns i-th example from DataSet. For fixed-size DataSets i should be
29 between 0 and size()-1. For on-line DataSets, the argument is ignored (and
30 should be -1 by convention to make it clear that it is not used), and
31 the next available example in the example stream is returned."""
32 return self.get_slice(i)
33
34
35 __getslice__(self,*args):
36 """Return a DataSet that is a subset of self, by specifying either
37 an interval of indices or list of indices, in the standard slicing notation."""
38 return self.get_slice(slice(*args))
39
40 get_slice(self,slice_or_index):
41 """This method should be redefined to do the actual work of slicing / getting an element."""
42 raise NotImplementedError
43
44 __getattr__(self, attribute):
45 """Return a DataSet that only contains the requested attribute from the examples."""
46 raise NotImplementedError
47
48 getFields(self,fields):
49 """Return an DataSet that only sees the fields named in the argument."""
50 raise NotImplementedError
51
52
53