# HG changeset patch # User Olivier Delalleau # Date 1285016715 14400 # Node ID 46527ae6db539bd4033c65e6e426c57180350578 # Parent acfd5e747a7521215aa5e30c279016c12734a175 architecture: Clarified what I meant about saving the model diff -r acfd5e747a75 -r 46527ae6db53 doc/v2_planning/architecture.txt --- a/doc/v2_planning/architecture.txt Mon Sep 20 11:28:23 2010 -0400 +++ b/doc/v2_planning/architecture.txt Mon Sep 20 17:05:15 2010 -0400 @@ -161,3 +161,15 @@ - Existing objects and code (including dbi and Jobman), with some more pieces to tie things together (Fred B.) +OD comments: We were in a hurry to close the meeting and I did not have time +to really explain what I meant when I suggested we should add the requirement +of saving the final "best" model. What I had in mind is a typical "applied ML" +experiment, i.e. the following approach that hopefully can be understood just +by writing it down in the form of a processing pipeline. The double cross +validation step, whose goal is to obtain an estimate of the generalization +error of our final model, is: + data -> k_fold_outer(preprocessing -> k_fold_inner(dbn -> evaluate) -> select_best -> retrain_on_all_data -> evaluate) -> evaluate +Once this is done, the model we want to save is obtained by doing + data -> preprocessing -> k_fold(dbn -> evaluate) -> select_best -> retrain_on_all_data +and we save + preprocessing -> best_model_selected