Mercurial > pylearn

diff doc/v2_planning/dataset.txt @ 1077:5c14d2ffcbb3
dataset: Looked into a few more existing ML libraries
author: Olivier Delalleau <delallea@iro>
date: Fri, 10 Sep 2010 12:48:32 -0400
parents: 20a1af112a75
children: f9f72ae84313
--- a/doc/v2_planning/dataset.txt	Fri Sep 10 12:11:10 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 10 12:48:32 2010 -0400
@@ -23,7 +23,7 @@
 
 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
   PairDataSet, Aggregate. Ultimately, the learner decides	
-- mlpy: very primitive notions of data
+- mlpy: very primitive notions of data (simple 2D matrices)
 - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
     SequentialDataSet, ReinforcementDataSet, ... Each class is quite
     constrained and may have a different interface.
@@ -34,7 +34,13 @@
 - Orange: Data matrices, with names and types associated to each column.
   Basically there seems to be only one base dataset class that contains the
   data. Data points are lists (of values corresponding to each column).
-- (still going through the other ones)
+- APGL: Hard to say how they deal with data from the documentation alone.
+- Monte: Data is simply numpy arrays.
+- scikits.learn: Dataset is a simple container with e.g. dataset.data being
+    a 2D numpy array of input features, and dataset.target the target vector.
+- Shogun: Vade Retro C++! (may be worth looking into their feature concept
+    though).
+- Any more worth looking at?
 
 A few things that our dataset containers should support at a minimum:
author	Olivier Delalleau <delallea@iro>
date	Fri, 10 Sep 2010 12:48:32 -0400
parents	20a1af112a75
children	f9f72ae84313