annotate doc/v2_planning/dataset.txt @ 1028:c6a74b24330b

coding_style: Olivier D confirmed as leader
author Olivier Delalleau <delallea@iro>
date Mon, 06 Sep 2010 20:41:51 -0400
parents fb6cae14fd07
children a154c9b68239
rev   line source
1002
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
1 Discussion of Function Specification for Dataset Types
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
2 ======================================================
f82093bf4405 adding learner.txt and dataset.txt in v2_planning/
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
3
1008
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
4 Some talking points from the September 2 meeting:
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
5
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
6 * Datasets as views/tasks (Pascal Vincent's idea): our dataset specification
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
7 needs to be flexible enough to accommodate different (sub)tasks and views of
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
8 the same underlying data.
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
9 * Datasets as probability distributions from which one can sample.
1023
fb6cae14fd07 dataset: Comment about viewing a dataset as a distribution
Olivier Delalleau <delallea@iro>
parents: 1008
diff changeset
10 * That's not something I would consider to be a dataset-related problem to
fb6cae14fd07 dataset: Comment about viewing a dataset as a distribution
Olivier Delalleau <delallea@iro>
parents: 1008
diff changeset
11 tackle now: a probability distribution in Pylearn would probably be a
fb6cae14fd07 dataset: Comment about viewing a dataset as a distribution
Olivier Delalleau <delallea@iro>
parents: 1008
diff changeset
12 different kind of beast, and it should be easy enough to have a
fb6cae14fd07 dataset: Comment about viewing a dataset as a distribution
Olivier Delalleau <delallea@iro>
parents: 1008
diff changeset
13 DatasetToDistribution class for instance, that would take care of viewing a
fb6cae14fd07 dataset: Comment about viewing a dataset as a distribution
Olivier Delalleau <delallea@iro>
parents: 1008
diff changeset
14 dataset as a probability distribution. -- OD
1008
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
15 * Our specification should allow transparent handling of infinite datasets (or
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
16 simply datasets which cannot fit in memory)
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
17 * GPU/buffering issues.
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
18
a5886b394bda Updating with talking points from Sept. 2 discussion
Dumitru Erhan <dumitru.erhan@gmail.com>
parents: 1002
diff changeset
19 Commiteee: DE, OB, OD, AB, PV