diff doc/v2_planning/architecture.txt @ 1107:e5306f5626d4

architecture: Yet another rant, this time about encapsulation vs. linearization
author Olivier Delalleau <delallea@iro>
date Mon, 13 Sep 2010 22:45:10 -0400
parents 4eda3f52ebef
children 967975f9c574
line wrap: on
line diff
--- a/doc/v2_planning/architecture.txt	Mon Sep 13 22:44:37 2010 -0400
+++ b/doc/v2_planning/architecture.txt	Mon Sep 13 22:45:10 2010 -0400
@@ -57,4 +57,55 @@
 just not essential to choose an API that will guarantee a match, or indeed to
 choose any explicit API at all.
 
+Encapsulation vs. linearity
+---------------------------
 
+A while ago, the Apstat crew went to fight "encapsulation" to propose instead
+a more "linearized" approach to experiment design. I must admit I didn't
+really understand the deep motivations behind this, and after practicing both
+styles (encapsulation for PLearn / Theano, linearity @ ARL / Ubisoft), I still
+don't. I do find, however, some not-so-deep-but-still-significant advantages
+to the linear version, which hopefully can be made clear (along with a
+clarification of what the h*** am I talking about) in the following example:
+
+   * Linear version:
+    my_experiment = pipeline([
+        data,
+        filter_samples,
+        PCA,
+        k_fold_split,
+        neural_net,
+        evaluation,
+    ])
+
+   * Encapsulated version:
+    my_experiment = evaluation(
+        data=PCA(filter_samples(data)),
+        split=k_fold_split,
+        model=neural_net)
+
+What I like in the linear version is it is much more easily human-readable
+(once you know what it means): you just follow the flow of the experiment by
+reading through a single list.
+On the other hand, the encapsulated version requires some deeper analysis to
+understand what is going on and in which order.
+Also, commenting out parts of the processing is simpler in the first case (it
+takes a single # in front of an element).
+However, linearity tends to break when the experiment is actually not linear,
+i.e. the graph of object dependencies is more complex (*).
+
+I'm just bringing this up because it may be nice to be able to provide the
+user with the most intuitive way to design experiments. I actually don't think
+those approaches are mutually exclusive, and it could be possible for the
+underlying system to use the more flexible / powerful encapsulated
+representation, while having the option to write simple scripts in a form that
+is easier to understand and manipulate.
+
+It could also be worth discussing this issue with Xavier / Christian /
+Nicolas.
+
+(*) Note that I cheated a bit in my example above: the graph from the
+encapsulated version is not a simple chain, so it is not obvious how to
+convert it into the pipeline given in the linear version. It's still possible
+though, but this is probably not the place to get into the details.
+