Mercurial > pylearn
diff doc/v2_planning/dataset.txt @ 1076:20a1af112a75
dataset: Looked into datasets from some other ML libraries
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 10 Sep 2010 12:11:10 -0400 |
parents | a474fabd1f37 |
children | 5c14d2ffcbb3 |
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt Fri Sep 10 11:42:48 2010 -0400 +++ b/doc/v2_planning/dataset.txt Fri Sep 10 12:11:10 2010 -0400 @@ -24,6 +24,16 @@ - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, PairDataSet, Aggregate. Ultimately, the learner decides - mlpy: very primitive notions of data +- PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet, + SequentialDataSet, ReinforcementDataSet, ... Each class is quite + constrained and may have a different interface. +- MDP: Seems to have restrictions on the type of data being passed around, as + well as its dimensionality ("Input array data is typically assumed to be + two-dimensional and ordered such that observations of the same variable are + stored on rows and different variables are stored on columns.") +- Orange: Data matrices, with names and types associated to each column. + Basically there seems to be only one base dataset class that contains the + data. Data points are lists (of values corresponding to each column). - (still going through the other ones) A few things that our dataset containers should support at a minimum: