changeset 1076:20a1af112a75

dataset: Looked into datasets from some other ML libraries
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 12:11:10 -0400
parents d422f726c156
children 5c14d2ffcbb3
files doc/v2_planning/dataset.txt
diffstat 1 files changed, 10 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt	Fri Sep 10 11:42:48 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 10 12:11:10 2010 -0400
@@ -24,6 +24,16 @@
 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
   PairDataSet, Aggregate. Ultimately, the learner decides	
 - mlpy: very primitive notions of data
+- PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
+    SequentialDataSet, ReinforcementDataSet, ... Each class is quite
+    constrained and may have a different interface.
+- MDP: Seems to have restrictions on the type of data being passed around, as
+    well as its dimensionality ("Input array data is typically assumed to be
+    two-dimensional and ordered such that observations of the same variable are
+    stored on rows and different variables are stored on columns.")
+- Orange: Data matrices, with names and types associated to each column.
+  Basically there seems to be only one base dataset class that contains the
+  data. Data points are lists (of values corresponding to each column).
 - (still going through the other ones)
 
 A few things that our dataset containers should support at a minimum: