changeset 1082:f9f72ae84313

dataset: Added a couple points we did not have time to discuss during meeting
author Olivier Delalleau <delallea@iro>
date Fri, 10 Sep 2010 15:36:23 -0400
parents 446bd478953f
children 4c00af69c164
files doc/v2_planning/dataset.txt
diffstat 1 files changed, 24 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt	Fri Sep 10 14:14:29 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 10 15:36:23 2010 -0400
@@ -204,4 +204,28 @@
 API?
 
 
+Field names and attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+OD: One important question is how to handle fields' names and characteristics.
+For instance, it can be useful to know that the 3rd input field represents a
+number of fingers, and is a non-negative discrete field whose numeric value is
+meaningful (compared, to, say, an integer index that would correspond to an
+animal's category). We mentioned metadata during the meeting, but we did not
+get into its details: that may be a place where to put this kind of things.
+
+
+Freeing memory
+~~~~~~~~~~~~~~
+
+OD: It is sometimes useful to be able to free memory used by previous
+computations. A typical example is when you load in memory the original
+dataset, then perform various processing steps, ending with a new dataset that
+you also store in memory before feeding it to the learner. Unless you very
+carefully design your code to avoid it, your original dataset will still
+remain in memory (as well as maybe the results of some computations performed
+along the way). So there may be a use for a `clear()` method that would be
+called by the topmost dataset (the one doing the final memory caching), and
+would be forwarded iteratively to previous datasets so as to get back all this
+wasted memory space.
+