Mercurial > pylearn
diff doc/v2_planning/dataset.txt @ 1077:5c14d2ffcbb3
dataset: Looked into a few more existing ML libraries
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 10 Sep 2010 12:48:32 -0400 |
parents | 20a1af112a75 |
children | f9f72ae84313 |
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt Fri Sep 10 12:11:10 2010 -0400 +++ b/doc/v2_planning/dataset.txt Fri Sep 10 12:48:32 2010 -0400 @@ -23,7 +23,7 @@ - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, PairDataSet, Aggregate. Ultimately, the learner decides -- mlpy: very primitive notions of data +- mlpy: very primitive notions of data (simple 2D matrices) - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet, SequentialDataSet, ReinforcementDataSet, ... Each class is quite constrained and may have a different interface. @@ -34,7 +34,13 @@ - Orange: Data matrices, with names and types associated to each column. Basically there seems to be only one base dataset class that contains the data. Data points are lists (of values corresponding to each column). -- (still going through the other ones) +- APGL: Hard to say how they deal with data from the documentation alone. +- Monte: Data is simply numpy arrays. +- scikits.learn: Dataset is a simple container with e.g. dataset.data being + a 2D numpy array of input features, and dataset.target the target vector. +- Shogun: Vade Retro C++! (may be worth looking into their feature concept + though). +- Any more worth looking at? A few things that our dataset containers should support at a minimum: