Mercurial > pylearn
changeset 1339:158493f8dff9
comment on dataset proposal by Olivier
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Thu, 21 Oct 2010 14:36:36 -0400 |
parents | 91637815b7ca |
children | 04b988fb00b6 |
files | doc/v2_planning/dataset.txt |
diffstat | 1 files changed, 21 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/v2_planning/dataset.txt Thu Oct 21 14:03:12 2010 -0400 +++ b/doc/v2_planning/dataset.txt Thu Oct 21 14:36:36 2010 -0400 @@ -574,3 +574,24 @@ (saying for instance that "input" is the concatenation of "x1" and "x2", and "target" is "y", for a dataset whose fields are x1, x2 and y). + +RP comments: + - I like this approach. I think having overlapping fields might be useful. + I would add that I was thinking of a way to look at one's results. Is + something I've been faced with, say you run 500 jobs and then you want to + understand those jobs' results. Looking just at the best performing seems a waste, and + there is a lot more information you can extract from your results if you are + able to generate certain plots or statistics. To do this you would need to + get the data in ipython (or something quite similar) where you have available + the needed functions to plot different things, generate different tables. The + point that I was trying to make is that you can get those results in + something that has this very API that Olivier described. This way both both + your input data and your results will be in the same form and whatever + visualization functions you have for your results you can use on your data as + well. For this you would need a bit more flexibility, in the sense that if + you have some data d, you should be able to put constraints on it, like + d.some_field == 5 means all entries in d that has some_field == 5, or + d.some_field > 5. You would also not use psql anymore but this console, + which would collect the results for you from sql, and give them to you as + data object. +