comparison doc/v2_planning/architecture.txt @ 1201:46527ae6db53

architecture: Clarified what I meant about saving the model
author Olivier Delalleau <delallea@iro>
date Mon, 20 Sep 2010 17:05:15 -0400
parents 9ff2242a817b
children b9d0a326e3e7
comparison
equal deleted inserted replaced
1200:acfd5e747a75 1201:46527ae6db53
159 - Plugins with a global scheduler driving the experiment (Razvan's team) 159 - Plugins with a global scheduler driving the experiment (Razvan's team)
160 - Objects, with basic hooks at predefined places (Pascal L.'s team) 160 - Objects, with basic hooks at predefined places (Pascal L.'s team)
161 - Existing objects and code (including dbi and Jobman), with some more 161 - Existing objects and code (including dbi and Jobman), with some more
162 pieces to tie things together (Fred B.) 162 pieces to tie things together (Fred B.)
163 163
164 OD comments: We were in a hurry to close the meeting and I did not have time
165 to really explain what I meant when I suggested we should add the requirement
166 of saving the final "best" model. What I had in mind is a typical "applied ML"
167 experiment, i.e. the following approach that hopefully can be understood just
168 by writing it down in the form of a processing pipeline. The double cross
169 validation step, whose goal is to obtain an estimate of the generalization
170 error of our final model, is:
171 data -> k_fold_outer(preprocessing -> k_fold_inner(dbn -> evaluate) -> select_best -> retrain_on_all_data -> evaluate) -> evaluate
172 Once this is done, the model we want to save is obtained by doing
173 data -> preprocessing -> k_fold(dbn -> evaluate) -> select_best -> retrain_on_all_data
174 and we save
175 preprocessing -> best_model_selected