# HG changeset patch
# User Olivier Delalleau <delallea@iro>
# Date 1285016715 14400
# Node ID 46527ae6db539bd4033c65e6e426c57180350578
# Parent  acfd5e747a7521215aa5e30c279016c12734a175
architecture: Clarified what I meant about saving the model

diff -r acfd5e747a75 -r 46527ae6db53 doc/v2_planning/architecture.txt
--- a/doc/v2_planning/architecture.txt	Mon Sep 20 11:28:23 2010 -0400
+++ b/doc/v2_planning/architecture.txt	Mon Sep 20 17:05:15 2010 -0400
@@ -161,3 +161,15 @@
     - Existing objects and code (including dbi and Jobman), with some more
       pieces to tie things together (Fred B.)
 
+OD comments: We were in a hurry to close the meeting and I did not have time
+to really explain what I meant when I suggested we should add the requirement
+of saving the final "best" model. What I had in mind is a typical "applied ML"
+experiment, i.e. the following approach that hopefully can be understood just
+by writing it down in the form of a processing pipeline. The double cross
+validation step, whose goal is to obtain an estimate of the generalization
+error of our final model, is:
+    data -> k_fold_outer(preprocessing -> k_fold_inner(dbn -> evaluate) -> select_best -> retrain_on_all_data -> evaluate) -> evaluate
+Once this is done, the model we want to save is obtained by doing
+    data -> preprocessing -> k_fold(dbn -> evaluate) -> select_best -> retrain_on_all_data
+and we save
+    preprocessing -> best_model_selected