# HG changeset patch # User Olivier Delalleau # Date 1284135070 14400 # Node ID 20a1af112a75ed9661f27f4bef7cfba335a75248 # Parent d422f726c1567618ee0de0330af80f2c1a8a5b55 dataset: Looked into datasets from some other ML libraries diff -r d422f726c156 -r 20a1af112a75 doc/v2_planning/dataset.txt --- a/doc/v2_planning/dataset.txt Fri Sep 10 11:42:48 2010 -0400 +++ b/doc/v2_planning/dataset.txt Fri Sep 10 12:11:10 2010 -0400 @@ -24,6 +24,16 @@ - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, PairDataSet, Aggregate. Ultimately, the learner decides - mlpy: very primitive notions of data +- PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet, + SequentialDataSet, ReinforcementDataSet, ... Each class is quite + constrained and may have a different interface. +- MDP: Seems to have restrictions on the type of data being passed around, as + well as its dimensionality ("Input array data is typically assumed to be + two-dimensional and ordered such that observations of the same variable are + stored on rows and different variables are stored on columns.") +- Orange: Data matrices, with names and types associated to each column. + Basically there seems to be only one base dataset class that contains the + data. Data points are lists (of values corresponding to each column). - (still going through the other ones) A few things that our dataset containers should support at a minimum: