comparison doc/v2_planning/architecture.txt @ 1107:e5306f5626d4

architecture: Yet another rant, this time about encapsulation vs. linearization
author Olivier Delalleau <delallea@iro>
date Mon, 13 Sep 2010 22:45:10 -0400
parents 4eda3f52ebef
children 967975f9c574
comparison
equal deleted inserted replaced
1106:21d25bed2ce9 1107:e5306f5626d4
55 semantics, the easier it is to substitute one for the other. As library 55 semantics, the easier it is to substitute one for the other. As library
56 designers, we should still aim for compatibility of similar algorithms. It's 56 designers, we should still aim for compatibility of similar algorithms. It's
57 just not essential to choose an API that will guarantee a match, or indeed to 57 just not essential to choose an API that will guarantee a match, or indeed to
58 choose any explicit API at all. 58 choose any explicit API at all.
59 59
60 Encapsulation vs. linearity
61 ---------------------------
60 62
63 A while ago, the Apstat crew went to fight "encapsulation" to propose instead
64 a more "linearized" approach to experiment design. I must admit I didn't
65 really understand the deep motivations behind this, and after practicing both
66 styles (encapsulation for PLearn / Theano, linearity @ ARL / Ubisoft), I still
67 don't. I do find, however, some not-so-deep-but-still-significant advantages
68 to the linear version, which hopefully can be made clear (along with a
69 clarification of what the h*** am I talking about) in the following example:
70
71 * Linear version:
72 my_experiment = pipeline([
73 data,
74 filter_samples,
75 PCA,
76 k_fold_split,
77 neural_net,
78 evaluation,
79 ])
80
81 * Encapsulated version:
82 my_experiment = evaluation(
83 data=PCA(filter_samples(data)),
84 split=k_fold_split,
85 model=neural_net)
86
87 What I like in the linear version is it is much more easily human-readable
88 (once you know what it means): you just follow the flow of the experiment by
89 reading through a single list.
90 On the other hand, the encapsulated version requires some deeper analysis to
91 understand what is going on and in which order.
92 Also, commenting out parts of the processing is simpler in the first case (it
93 takes a single # in front of an element).
94 However, linearity tends to break when the experiment is actually not linear,
95 i.e. the graph of object dependencies is more complex (*).
96
97 I'm just bringing this up because it may be nice to be able to provide the
98 user with the most intuitive way to design experiments. I actually don't think
99 those approaches are mutually exclusive, and it could be possible for the
100 underlying system to use the more flexible / powerful encapsulated
101 representation, while having the option to write simple scripts in a form that
102 is easier to understand and manipulate.
103
104 It could also be worth discussing this issue with Xavier / Christian /
105 Nicolas.
106
107 (*) Note that I cheated a bit in my example above: the graph from the
108 encapsulated version is not a simple chain, so it is not obvious how to
109 convert it into the pipeline given in the linear version. It's still possible
110 though, but this is probably not the place to get into the details.
111