comparison doc/v2_planning/dataset.txt @ 1076:20a1af112a75

dataset: Looked into datasets from some other ML libraries
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 12:11:10 -0400
parents a474fabd1f37
children 5c14d2ffcbb3
comparison
equal deleted inserted replaced
1075:d422f726c156 1076:20a1af112a75
22 Some ideas from existing ML libraries: 22 Some ideas from existing ML libraries:
23 23
24 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, 24 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
25 PairDataSet, Aggregate. Ultimately, the learner decides 25 PairDataSet, Aggregate. Ultimately, the learner decides
26 - mlpy: very primitive notions of data 26 - mlpy: very primitive notions of data
27 - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
28 SequentialDataSet, ReinforcementDataSet, ... Each class is quite
29 constrained and may have a different interface.
30 - MDP: Seems to have restrictions on the type of data being passed around, as
31 well as its dimensionality ("Input array data is typically assumed to be
32 two-dimensional and ordered such that observations of the same variable are
33 stored on rows and different variables are stored on columns.")
34 - Orange: Data matrices, with names and types associated to each column.
35 Basically there seems to be only one base dataset class that contains the
36 data. Data points are lists (of values corresponding to each column).
27 - (still going through the other ones) 37 - (still going through the other ones)
28 38
29 A few things that our dataset containers should support at a minimum: 39 A few things that our dataset containers should support at a minimum:
30 40
31 - streams, possibly infinite 41 - streams, possibly infinite