changeset 1077:5c14d2ffcbb3

dataset: Looked into a few more existing ML libraries
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 12:48:32 -0400
parents 20a1af112a75
children b5754e85c472
files doc/v2_planning/dataset.txt
diffstat 1 files changed, 8 insertions(+), 2 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt	Fri Sep 10 12:11:10 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 10 12:48:32 2010 -0400
@@ -23,7 +23,7 @@
 
 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
   PairDataSet, Aggregate. Ultimately, the learner decides	
-- mlpy: very primitive notions of data
+- mlpy: very primitive notions of data (simple 2D matrices)
 - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
     SequentialDataSet, ReinforcementDataSet, ... Each class is quite
     constrained and may have a different interface.
@@ -34,7 +34,13 @@
 - Orange: Data matrices, with names and types associated to each column.
   Basically there seems to be only one base dataset class that contains the
   data. Data points are lists (of values corresponding to each column).
-- (still going through the other ones)
+- APGL: Hard to say how they deal with data from the documentation alone.
+- Monte: Data is simply numpy arrays.
+- scikits.learn: Dataset is a simple container with e.g. dataset.data being
+    a 2D numpy array of input features, and dataset.target the target vector.
+- Shogun: Vade Retro C++! (may be worth looking into their feature concept
+    though).
+- Any more worth looking at?
 
 A few things that our dataset containers should support at a minimum: