diff doc/v2_planning/dataset.txt @ 1339:158493f8dff9

comment on dataset proposal by Olivier
author Razvan Pascanu <r.pascanu@gmail.com>
date Thu, 21 Oct 2010 14:36:36 -0400
parents 91637815b7ca
children 04b988fb00b6
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt	Thu Oct 21 14:03:12 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Thu Oct 21 14:36:36 2010 -0400
@@ -574,3 +574,24 @@
       (saying for instance that "input" is the concatenation of "x1" and "x2",
       and "target" is "y", for a dataset whose fields are x1, x2 and y).
 
+
+RP comments:
+ - I like this approach. I think having overlapping fields might be useful.
+ I would add that I was thinking of a way to look at one's results. Is
+ something I've been faced with, say you run 500 jobs and then you want to
+ understand those jobs' results. Looking just at the best performing seems a waste, and
+ there is a lot more information you can extract from your results if you are
+ able to generate certain plots or statistics. To do this you would need to
+ get the data in ipython (or something quite similar) where you have available
+ the needed functions to plot different things, generate different tables. The
+ point that I was trying to make is that you can get those results in
+ something that has this very API that Olivier described. This way both both
+ your input data and your results will be in the same form and whatever
+ visualization functions you have for your results you can use on your data as
+ well. For this you would need a bit more flexibility, in the sense that if
+ you have some data d, you should be able to put constraints on it, like 
+ d.some_field == 5 means all entries in d that has some_field == 5, or 
+ d.some_field > 5. You would also not use psql anymore but this console, 
+ which would collect the results for you from sql, and give them to you as 
+ data object.
+