pylearn: doc/v2_planning/architecture.txt comparison

comparison doc/v2_planning/architecture.txt @ 1107:e5306f5626d4

architecture: Yet another rant, this time about encapsulation vs. linearization

author	Olivier Delalleau <delallea@iro>
date	Mon, 13 Sep 2010 22:45:10 -0400
parents	4eda3f52ebef
children	967975f9c574

comparison

equal deleted inserted replaced

-:21d25bed2ce9
+:e5306f5626d4
 semantics, the easier it is to substitute one for the other.  As library
 designers, we should still aim for compatibility of similar algorithms.  It's
 just not essential to choose an API that will guarantee a match, or indeed to
 choose any explicit API at all.
+Encapsulation vs. linearity
+---------------------------
+A while ago, the Apstat crew went to fight "encapsulation" to propose instead
+a more "linearized" approach to experiment design. I must admit I didn't
+really understand the deep motivations behind this, and after practicing both
+styles (encapsulation for PLearn / Theano, linearity @ ARL / Ubisoft), I still
+don't. I do find, however, some not-so-deep-but-still-significant advantages
+to the linear version, which hopefully can be made clear (along with a
+clarification of what the h*** am I talking about) in the following example:
+* Linear version:
+my_experiment = pipeline([
+data,
+filter_samples,
+PCA,
+k_fold_split,
+neural_net,
+evaluation,
+])
+* Encapsulated version:
+my_experiment = evaluation(
+data=PCA(filter_samples(data)),
+split=k_fold_split,
+model=neural_net)
+What I like in the linear version is it is much more easily human-readable
+(once you know what it means): you just follow the flow of the experiment by
+reading through a single list.
+On the other hand, the encapsulated version requires some deeper analysis to
+understand what is going on and in which order.
+Also, commenting out parts of the processing is simpler in the first case (it
+takes a single # in front of an element).
+However, linearity tends to break when the experiment is actually not linear,
+i.e. the graph of object dependencies is more complex (*).
+I'm just bringing this up because it may be nice to be able to provide the
+user with the most intuitive way to design experiments. I actually don't think
+those approaches are mutually exclusive, and it could be possible for the
+underlying system to use the more flexible / powerful encapsulated
+representation, while having the option to write simple scripts in a form that
+is easier to understand and manipulate.
+It could also be worth discussing this issue with Xavier / Christian /
+Nicolas.
+(*) Note that I cheated a bit in my example above: the graph from the
+encapsulated version is not a simple chain, so it is not obvious how to
+convert it into the pipeline given in the linear version. It's still possible
+though, but this is probably not the place to get into the details.

Mercurial > pylearn

comparison doc/v2_planning/architecture.txt @ 1107:e5306f5626d4