# HG changeset patch # User Olivier Delalleau # Date 1284147383 14400 # Node ID f9f72ae84313026b6db95a409d9cc4d0804e3a61 # Parent 446bd478953ffc6e6eafd119cd9a7b5335c95f87 dataset: Added a couple points we did not have time to discuss during meeting diff -r 446bd478953f -r f9f72ae84313 doc/v2_planning/dataset.txt --- a/doc/v2_planning/dataset.txt Fri Sep 10 14:14:29 2010 -0400 +++ b/doc/v2_planning/dataset.txt Fri Sep 10 15:36:23 2010 -0400 @@ -204,4 +204,28 @@ API? +Field names and attributes +~~~~~~~~~~~~~~~~~~~~~~~~~~ +OD: One important question is how to handle fields' names and characteristics. +For instance, it can be useful to know that the 3rd input field represents a +number of fingers, and is a non-negative discrete field whose numeric value is +meaningful (compared, to, say, an integer index that would correspond to an +animal's category). We mentioned metadata during the meeting, but we did not +get into its details: that may be a place where to put this kind of things. + + +Freeing memory +~~~~~~~~~~~~~~ + +OD: It is sometimes useful to be able to free memory used by previous +computations. A typical example is when you load in memory the original +dataset, then perform various processing steps, ending with a new dataset that +you also store in memory before feeding it to the learner. Unless you very +carefully design your code to avoid it, your original dataset will still +remain in memory (as well as maybe the results of some computations performed +along the way). So there may be a use for a `clear()` method that would be +called by the topmost dataset (the one doing the final memory caching), and +would be forwarded iteratively to previous datasets so as to get back all this +wasted memory space. +