pylearn: doc/v2_planning/dataset.txt comparison

dataset: Looked into datasets from some other ML libraries

author	Olivier Delalleau <delallea@iro>
date	Fri, 10 Sep 2010 12:11:10 -0400
parents	a474fabd1f37
children	5c14d2ffcbb3

comparison

equal deleted inserted replaced

-:d422f726c156
+:20a1af112a75
 Some ideas from existing ML libraries:
 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
 PairDataSet, Aggregate. Ultimately, the learner decides
 - mlpy: very primitive notions of data
+- PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
+SequentialDataSet, ReinforcementDataSet, ... Each class is quite
+constrained and may have a different interface.
+- MDP: Seems to have restrictions on the type of data being passed around, as
+well as its dimensionality ("Input array data is typically assumed to be
+two-dimensional and ordered such that observations of the same variable are
+stored on rows and different variables are stored on columns.")
+- Orange: Data matrices, with names and types associated to each column.
+Basically there seems to be only one base dataset class that contains the
+data. Data points are lists (of values corresponding to each column).
 - (still going through the other ones)
 A few things that our dataset containers should support at a minimum:
 - streams, possibly infinite

Mercurial > pylearn