changeset 1201:46527ae6db53

architecture: Clarified what I meant about saving the model
author Olivier Delalleau <delallea@iro>
date Mon, 20 Sep 2010 17:05:15 -0400
parents acfd5e747a75
children 7fff3d5c7694
files doc/v2_planning/architecture.txt
diffstat 1 files changed, 12 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/doc/v2_planning/architecture.txt	Mon Sep 20 11:28:23 2010 -0400
+++ b/doc/v2_planning/architecture.txt	Mon Sep 20 17:05:15 2010 -0400
@@ -161,3 +161,15 @@
     - Existing objects and code (including dbi and Jobman), with some more
       pieces to tie things together (Fred B.)
 
+OD comments: We were in a hurry to close the meeting and I did not have time
+to really explain what I meant when I suggested we should add the requirement
+of saving the final "best" model. What I had in mind is a typical "applied ML"
+experiment, i.e. the following approach that hopefully can be understood just
+by writing it down in the form of a processing pipeline. The double cross
+validation step, whose goal is to obtain an estimate of the generalization
+error of our final model, is:
+    data -> k_fold_outer(preprocessing -> k_fold_inner(dbn -> evaluate) -> select_best -> retrain_on_all_data -> evaluate) -> evaluate
+Once this is done, the model we want to save is obtained by doing
+    data -> preprocessing -> k_fold(dbn -> evaluate) -> select_best -> retrain_on_all_data
+and we save
+    preprocessing -> best_model_selected