Mercurial > pylearn
comparison doc/v2_planning/use_cases.txt @ 1106:21d25bed2ce9
use_cases: Comment about using predefined dataset dimensions
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Mon, 13 Sep 2010 22:44:37 -0400 |
parents | b422cbaddc52 |
children | 0e12ea6ba661 |
comparison
equal
deleted
inserted
replaced
1105:546bd0ccb0e4 | 1106:21d25bed2ce9 |
---|---|
95 algorithms, etc.) can be swapped. | 95 algorithms, etc.) can be swapped. |
96 | 96 |
97 - there are no APIs for things which are not passed as arguments (i.e. the logic | 97 - there are no APIs for things which are not passed as arguments (i.e. the logic |
98 of the whole program is not exposed via some uber-API). | 98 of the whole program is not exposed via some uber-API). |
99 | 99 |
100 OD comments: I didn't have time to look closely at the details, but overall I | |
101 like the general feel of it. At least I'd expect us to need something like | |
102 that to be able to handle the multiple use cases we want to support. I must | |
103 say I'm a bit worried though that it could become scary pretty fast to the | |
104 newcomer, with 'lambda functions' and 'virtual machines'. | |
105 Anyway, one point I would like to comment on is the line that creates the | |
106 linear classifier. I hope that, as much as possible, we can avoid the need to | |
107 specify dataset dimensions / number of classes in algorithm constructors. I | |
108 regularly had issues in PLearn with the fact we had for instance to give the | |
109 number of inputs when creating a neural network. I much prefer when this kind | |
110 of thing can be figured out at runtime: | |
111 - Any parameter you can get rid of is a significant gain in | |
112 user-friendliness. | |
113 - It's not always easy to know in advance e.g. the dimension of your input | |
114 dataset. Imagine for instance this dataset is obtained in a first step | |
115 by going through a PCA whose number of output dimensions is set so as to | |
116 keep 90% of the variance. | |
117 - It seems to me it fits better the idea of a symbolic graph: my intuition | |
118 (that may be very different from what you actually have in mind) is to | |
119 see an experiment as a symbolic graph, which you instantiate when you | |
120 provide the input data. One advantage of this point of view is it makes | |
121 it natural to re-use the same block components on various datasets / | |
122 splits, something we often want to do. | |
100 | 123 |
101 K-fold cross validation of a classifier | 124 K-fold cross validation of a classifier |
102 --------------------------------------- | 125 --------------------------------------- |
103 | 126 |
104 splits = kfold_cross_validate( | 127 splits = kfold_cross_validate( |