Mercurial > pylearn
comparison doc/v2_planning/dataset.txt @ 1076:20a1af112a75
dataset: Looked into datasets from some other ML libraries
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 10 Sep 2010 12:11:10 -0400 |
parents | a474fabd1f37 |
children | 5c14d2ffcbb3 |
comparison
equal
deleted
inserted
replaced
1075:d422f726c156 | 1076:20a1af112a75 |
---|---|
22 Some ideas from existing ML libraries: | 22 Some ideas from existing ML libraries: |
23 | 23 |
24 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, | 24 - PyML: notion of dataset containers: VectorDataSet, SparseDataSet, KernelData, |
25 PairDataSet, Aggregate. Ultimately, the learner decides | 25 PairDataSet, Aggregate. Ultimately, the learner decides |
26 - mlpy: very primitive notions of data | 26 - mlpy: very primitive notions of data |
27 - PyBrain: Datasets are geared towards specific tasks: ClassificationDataSet, | |
28 SequentialDataSet, ReinforcementDataSet, ... Each class is quite | |
29 constrained and may have a different interface. | |
30 - MDP: Seems to have restrictions on the type of data being passed around, as | |
31 well as its dimensionality ("Input array data is typically assumed to be | |
32 two-dimensional and ordered such that observations of the same variable are | |
33 stored on rows and different variables are stored on columns.") | |
34 - Orange: Data matrices, with names and types associated to each column. | |
35 Basically there seems to be only one base dataset class that contains the | |
36 data. Data points are lists (of values corresponding to each column). | |
27 - (still going through the other ones) | 37 - (still going through the other ones) |
28 | 38 |
29 A few things that our dataset containers should support at a minimum: | 39 A few things that our dataset containers should support at a minimum: |
30 | 40 |
31 - streams, possibly infinite | 41 - streams, possibly infinite |