changeset 1106:21d25bed2ce9

use_cases: Comment about using predefined dataset dimensions
author Olivier Delalleau <delallea@iro>
date Mon, 13 Sep 2010 22:44:37 -0400
parents 546bd0ccb0e4
children e5306f5626d4
files doc/v2_planning/use_cases.txt
diffstat 1 files changed, 23 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/use_cases.txt	Mon Sep 13 22:06:23 2010 -0400
+++ b/doc/v2_planning/use_cases.txt	Mon Sep 13 22:44:37 2010 -0400
@@ -97,6 +97,29 @@
 - there are no APIs for things which are not passed as arguments (i.e. the logic
   of the whole program is not exposed via some uber-API).
 
+OD comments: I didn't have time to look closely at the details, but overall I
+like the general feel of it. At least I'd expect us to need something like
+that to be able to handle the multiple use cases we want to support. I must
+say I'm a bit worried though that it could become scary pretty fast to the
+newcomer, with 'lambda functions' and 'virtual machines'.
+Anyway, one point I would like to comment on is the line that creates the
+linear classifier. I hope that, as much as possible, we can avoid the need to
+specify dataset dimensions / number of classes in algorithm constructors. I
+regularly had issues in PLearn with the fact we had for instance to give the
+number of inputs when creating a neural network. I much prefer when this kind
+of thing can be figured out at runtime:
+    - Any parameter you can get rid of is a significant gain in
+      user-friendliness.
+    - It's not always easy to know in advance e.g. the dimension of your input
+      dataset. Imagine for instance this dataset is obtained in a first step
+      by going through a PCA whose number of output dimensions is set so as to
+      keep 90% of the variance.
+    - It seems to me it fits better the idea of a symbolic graph: my intuition
+      (that may be very different from what you actually have in mind) is to
+      see an experiment as a symbolic graph, which you instantiate when you
+      provide the input data. One advantage of this point of view is it makes
+      it natural to re-use the same block components on various datasets /
+      splits, something we often want to do.
 
 K-fold cross validation of a classifier
 ---------------------------------------