comparison doc/v2_planning/dataset.txt @ 1339:158493f8dff9

comment on dataset proposal by Olivier
author Razvan Pascanu <r.pascanu@gmail.com>
date Thu, 21 Oct 2010 14:36:36 -0400
parents 91637815b7ca
children 04b988fb00b6
comparison
equal deleted inserted replaced
1338:91637815b7ca 1339:158493f8dff9
572 and "target" for the regression task), and if the dataset does not 572 and "target" for the regression task), and if the dataset does not
573 already defines these fields, using a dataset wrapper than does it 573 already defines these fields, using a dataset wrapper than does it
574 (saying for instance that "input" is the concatenation of "x1" and "x2", 574 (saying for instance that "input" is the concatenation of "x1" and "x2",
575 and "target" is "y", for a dataset whose fields are x1, x2 and y). 575 and "target" is "y", for a dataset whose fields are x1, x2 and y).
576 576
577
578 RP comments:
579 - I like this approach. I think having overlapping fields might be useful.
580 I would add that I was thinking of a way to look at one's results. Is
581 something I've been faced with, say you run 500 jobs and then you want to
582 understand those jobs' results. Looking just at the best performing seems a waste, and
583 there is a lot more information you can extract from your results if you are
584 able to generate certain plots or statistics. To do this you would need to
585 get the data in ipython (or something quite similar) where you have available
586 the needed functions to plot different things, generate different tables. The
587 point that I was trying to make is that you can get those results in
588 something that has this very API that Olivier described. This way both both
589 your input data and your results will be in the same form and whatever
590 visualization functions you have for your results you can use on your data as
591 well. For this you would need a bit more flexibility, in the sense that if
592 you have some data d, you should be able to put constraints on it, like
593 d.some_field == 5 means all entries in d that has some_field == 5, or
594 d.some_field > 5. You would also not use psql anymore but this console,
595 which would collect the results for you from sql, and give them to you as
596 data object.
597