pylearn: doc/v2_planning/dataset.txt comparison

comparison doc/v2_planning/dataset.txt @ 1077:5c14d2ffcbb3

dataset: Looked into a few more existing ML libraries

author	Olivier Delalleau <delallea@iro>
date	Fri, 10 Sep 2010 12:48:32 -0400
parents	20a1af112a75
children	f9f72ae84313

comparison

equal deleted inserted replaced

-:20a1af112a75
+:5c14d2ffcbb3
 Some ideas from existing ML libraries:
 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData,
 PairDataSet, Aggregate. Ultimately, the learner decides
-- mlpy: very primitive notions of data
+- mlpy: very primitive notions of data (simple 2D matrices)
 - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet,
 SequentialDataSet, ReinforcementDataSet, ... Each class is quite
 constrained and may have a different interface.
 - MDP: Seems to have restrictions on the type of data being passed around, as
 well as its dimensionality ("Input array data is typically assumed to be
 two-dimensional and ordered such that observations of the same variable are
 stored on rows and different variables are stored on columns.")
 - Orange: Data matrices, with names and types associated to each column.
 Basically there seems to be only one base dataset class that contains the
 data. Data points are lists (of values corresponding to each column).
-- (still going through the other ones)
+- APGL: Hard to say how they deal with data from the documentation alone.
+- Monte: Data is simply numpy arrays.
+- scikits.learn: Dataset is a simple container with e.g. dataset.data being
+a 2D numpy array of input features, and dataset.target the target vector.
+- Shogun: Vade Retro C++! (may be worth looking into their feature concept
+though).
+- Any more worth looking at?
 A few things that our dataset containers should support at a minimum:
 - streams, possibly infinite
 - task/views of the data for different problems

Mercurial > pylearn

comparison doc/v2_planning/dataset.txt @ 1077:5c14d2ffcbb3