Mercurial > pylearn
changeset 1107:e5306f5626d4
architecture: Yet another rant, this time about encapsulation vs. linearization
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Mon, 13 Sep 2010 22:45:10 -0400 |
parents | 21d25bed2ce9 |
children | c5c7ba805a2f |
files | doc/v2_planning/architecture.txt |
diffstat | 1 files changed, 51 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/doc/v2_planning/architecture.txt Mon Sep 13 22:44:37 2010 -0400 +++ b/doc/v2_planning/architecture.txt Mon Sep 13 22:45:10 2010 -0400 @@ -57,4 +57,55 @@ just not essential to choose an API that will guarantee a match, or indeed to choose any explicit API at all. +Encapsulation vs. linearity +--------------------------- +A while ago, the Apstat crew went to fight "encapsulation" to propose instead +a more "linearized" approach to experiment design. I must admit I didn't +really understand the deep motivations behind this, and after practicing both +styles (encapsulation for PLearn / Theano, linearity @ ARL / Ubisoft), I still +don't. I do find, however, some not-so-deep-but-still-significant advantages +to the linear version, which hopefully can be made clear (along with a +clarification of what the h*** am I talking about) in the following example: + + * Linear version: + my_experiment = pipeline([ + data, + filter_samples, + PCA, + k_fold_split, + neural_net, + evaluation, + ]) + + * Encapsulated version: + my_experiment = evaluation( + data=PCA(filter_samples(data)), + split=k_fold_split, + model=neural_net) + +What I like in the linear version is it is much more easily human-readable +(once you know what it means): you just follow the flow of the experiment by +reading through a single list. +On the other hand, the encapsulated version requires some deeper analysis to +understand what is going on and in which order. +Also, commenting out parts of the processing is simpler in the first case (it +takes a single # in front of an element). +However, linearity tends to break when the experiment is actually not linear, +i.e. the graph of object dependencies is more complex (*). + +I'm just bringing this up because it may be nice to be able to provide the +user with the most intuitive way to design experiments. I actually don't think +those approaches are mutually exclusive, and it could be possible for the +underlying system to use the more flexible / powerful encapsulated +representation, while having the option to write simple scripts in a form that +is easier to understand and manipulate. + +It could also be worth discussing this issue with Xavier / Christian / +Nicolas. + +(*) Note that I cheated a bit in my example above: the graph from the +encapsulated version is not a simple chain, so it is not obvious how to +convert it into the pipeline given in the linear version. It's still possible +though, but this is probably not the place to get into the details. +