Mercurial > pylearn
comparison dataset.py @ 1:2cd82666b9a7
Added statscollector and started writing dataset and learner.
author | bengioy@esprit.iro.umontreal.ca |
---|---|
date | Fri, 14 Mar 2008 11:28:08 -0400 |
parents | 586dcaa4b2df |
children | 3fddb1c8f955 |
comparison
equal
deleted
inserted
replaced
0:586dcaa4b2df | 1:2cd82666b9a7 |
---|---|
1 | |
2 | |
3 class DataSet(object): | |
4 """Base class for representing a fixed-size or variable-size (online learning) | |
5 data set. A DataSet is used in a Learner to represent a training set or a | |
6 validation set. It is an indexed collection of examples. An example | |
7 is expected to obey the syntax of dictionaries, i.e., it contains named | |
8 fields that can be accessed via the [fieldname] syntax. | |
9 If one views a DataSet as a matrix, the [i] operator selects a row while the .fieldname | |
10 operator selects a named 'field' or column. However, each of the entries in one of | |
11 these 'columns' can be any python object, not just a number. One can also | |
12 use the slicing notation to select a subset of example and the getFields | |
13 method to select a subset of the fields.""" | |
14 | |
15 __init__(self): | |
16 pass | |
17 | |
18 size(self): | |
19 """Return -1 for variable-size DataSets (for on-line learning), and | |
20 the actual size otherwise""" | |
21 return 0 | |
22 | |
23 fieldNames(self): | |
24 """Return the list of field names that are supported by getattr and getFields.""" | |
25 raise NotImplementedError | |
26 | |
27 __getitem__(self, i): | |
28 """dataset[i] returns i-th example from DataSet. For fixed-size DataSets i should be | |
29 between 0 and size()-1. For on-line DataSets, the argument is ignored (and | |
30 should be -1 by convention to make it clear that it is not used), and | |
31 the next available example in the example stream is returned.""" | |
32 return self.get_slice(i) | |
33 | |
34 | |
35 __getslice__(self,*args): | |
36 """Return a DataSet that is a subset of self, by specifying either | |
37 an interval of indices or list of indices, in the standard slicing notation.""" | |
38 return self.get_slice(slice(*args)) | |
39 | |
40 get_slice(self,slice_or_index): | |
41 """This method should be redefined to do the actual work of slicing / getting an element.""" | |
42 raise NotImplementedError | |
43 | |
44 __getattr__(self, attribute): | |
45 """Return a DataSet that only contains the requested attribute from the examples.""" | |
46 raise NotImplementedError | |
47 | |
48 getFields(self,fields): | |
49 """Return an DataSet that only sees the fields named in the argument.""" | |
50 raise NotImplementedError | |
51 | |
52 | |
53 |