# HG changeset patch
# User Razvan Pascanu <r.pascanu@gmail.com>
# Date 1287686196 14400
# Node ID 158493f8dff97a8762f5cc447052750e213f15d2
# Parent  91637815b7ca201c7e772dd813456de8f77bbe9b
comment on dataset proposal by Olivier

diff -r 91637815b7ca -r 158493f8dff9 doc/v2_planning/dataset.txt
--- a/doc/v2_planning/dataset.txt	Thu Oct 21 14:03:12 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Thu Oct 21 14:36:36 2010 -0400
@@ -574,3 +574,24 @@
       (saying for instance that "input" is the concatenation of "x1" and "x2",
       and "target" is "y", for a dataset whose fields are x1, x2 and y).
 
+
+RP comments:
+ - I like this approach. I think having overlapping fields might be useful.
+ I would add that I was thinking of a way to look at one's results. Is
+ something I've been faced with, say you run 500 jobs and then you want to
+ understand those jobs' results. Looking just at the best performing seems a waste, and
+ there is a lot more information you can extract from your results if you are
+ able to generate certain plots or statistics. To do this you would need to
+ get the data in ipython (or something quite similar) where you have available
+ the needed functions to plot different things, generate different tables. The
+ point that I was trying to make is that you can get those results in
+ something that has this very API that Olivier described. This way both both
+ your input data and your results will be in the same form and whatever
+ visualization functions you have for your results you can use on your data as
+ well. For this you would need a bit more flexibility, in the sense that if
+ you have some data d, you should be able to put constraints on it, like 
+ d.some_field == 5 means all entries in d that has some_field == 5, or 
+ d.some_field > 5. You would also not use psql anymore but this console, 
+ which would collect the results for you from sql, and give them to you as 
+ data object.
+