# HG changeset patch # User Razvan Pascanu # Date 1287686196 14400 # Node ID 158493f8dff97a8762f5cc447052750e213f15d2 # Parent 91637815b7ca201c7e772dd813456de8f77bbe9b comment on dataset proposal by Olivier diff -r 91637815b7ca -r 158493f8dff9 doc/v2_planning/dataset.txt --- a/doc/v2_planning/dataset.txt Thu Oct 21 14:03:12 2010 -0400 +++ b/doc/v2_planning/dataset.txt Thu Oct 21 14:36:36 2010 -0400 @@ -574,3 +574,24 @@ (saying for instance that "input" is the concatenation of "x1" and "x2", and "target" is "y", for a dataset whose fields are x1, x2 and y). + +RP comments: + - I like this approach. I think having overlapping fields might be useful. + I would add that I was thinking of a way to look at one's results. Is + something I've been faced with, say you run 500 jobs and then you want to + understand those jobs' results. Looking just at the best performing seems a waste, and + there is a lot more information you can extract from your results if you are + able to generate certain plots or statistics. To do this you would need to + get the data in ipython (or something quite similar) where you have available + the needed functions to plot different things, generate different tables. The + point that I was trying to make is that you can get those results in + something that has this very API that Olivier described. This way both both + your input data and your results will be in the same form and whatever + visualization functions you have for your results you can use on your data as + well. For this you would need a bit more flexibility, in the sense that if + you have some data d, you should be able to put constraints on it, like + d.some_field == 5 means all entries in d that has some_field == 5, or + d.some_field > 5. You would also not use psql anymore but this console, + which would collect the results for you from sql, and give them to you as + data object. +