Mercurial > pylearn
comparison doc/v2_planning/architecture.txt @ 1107:e5306f5626d4
architecture: Yet another rant, this time about encapsulation vs. linearization
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Mon, 13 Sep 2010 22:45:10 -0400 |
parents | 4eda3f52ebef |
children | 967975f9c574 |
comparison
equal
deleted
inserted
replaced
1106:21d25bed2ce9 | 1107:e5306f5626d4 |
---|---|
55 semantics, the easier it is to substitute one for the other. As library | 55 semantics, the easier it is to substitute one for the other. As library |
56 designers, we should still aim for compatibility of similar algorithms. It's | 56 designers, we should still aim for compatibility of similar algorithms. It's |
57 just not essential to choose an API that will guarantee a match, or indeed to | 57 just not essential to choose an API that will guarantee a match, or indeed to |
58 choose any explicit API at all. | 58 choose any explicit API at all. |
59 | 59 |
60 Encapsulation vs. linearity | |
61 --------------------------- | |
60 | 62 |
63 A while ago, the Apstat crew went to fight "encapsulation" to propose instead | |
64 a more "linearized" approach to experiment design. I must admit I didn't | |
65 really understand the deep motivations behind this, and after practicing both | |
66 styles (encapsulation for PLearn / Theano, linearity @ ARL / Ubisoft), I still | |
67 don't. I do find, however, some not-so-deep-but-still-significant advantages | |
68 to the linear version, which hopefully can be made clear (along with a | |
69 clarification of what the h*** am I talking about) in the following example: | |
70 | |
71 * Linear version: | |
72 my_experiment = pipeline([ | |
73 data, | |
74 filter_samples, | |
75 PCA, | |
76 k_fold_split, | |
77 neural_net, | |
78 evaluation, | |
79 ]) | |
80 | |
81 * Encapsulated version: | |
82 my_experiment = evaluation( | |
83 data=PCA(filter_samples(data)), | |
84 split=k_fold_split, | |
85 model=neural_net) | |
86 | |
87 What I like in the linear version is it is much more easily human-readable | |
88 (once you know what it means): you just follow the flow of the experiment by | |
89 reading through a single list. | |
90 On the other hand, the encapsulated version requires some deeper analysis to | |
91 understand what is going on and in which order. | |
92 Also, commenting out parts of the processing is simpler in the first case (it | |
93 takes a single # in front of an element). | |
94 However, linearity tends to break when the experiment is actually not linear, | |
95 i.e. the graph of object dependencies is more complex (*). | |
96 | |
97 I'm just bringing this up because it may be nice to be able to provide the | |
98 user with the most intuitive way to design experiments. I actually don't think | |
99 those approaches are mutually exclusive, and it could be possible for the | |
100 underlying system to use the more flexible / powerful encapsulated | |
101 representation, while having the option to write simple scripts in a form that | |
102 is easier to understand and manipulate. | |
103 | |
104 It could also be worth discussing this issue with Xavier / Christian / | |
105 Nicolas. | |
106 | |
107 (*) Note that I cheated a bit in my example above: the graph from the | |
108 encapsulated version is not a simple chain, so it is not obvious how to | |
109 convert it into the pipeline given in the linear version. It's still possible | |
110 though, but this is probably not the place to get into the details. | |
111 |