Mercurial > pylearn
annotate doc/v2_planning/datalearn.txt @ 1357:ffa2932a8cba
Added datalearn committee discussion file
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Thu, 11 Nov 2010 16:34:38 -0500 |
parents | |
children | 5db730bb0e8e |
rev | line source |
---|---|
1357
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
1 DataLearn: How to plug Datasets & Learner together? |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
2 =================================================== |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
3 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
4 Participants |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
5 ------------ |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
6 - Yoshua |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
7 - Razvan |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
8 - Olivier D [leader?] |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
9 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
10 High-Level Objectives |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
11 --------------------- |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
12 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
13 * Simple ML experiments should be simple to write |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
14 * More complex / advanced scenarios should be possible without being forced |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
15 to work "outside" of this framework |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
16 * Computations should be optimized whenever possible |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
17 * Existing code (in any language) should be "wrappable" within this |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
18 framework |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
19 * It should be possible to replace [parts of] this framework with C++ code |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
20 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
21 Theano-Like Data Flow |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
22 --------------------- |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
23 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
24 We want to rely on Theano to be able to take advantage of its efficient |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
25 computations. The general idea is that if we chain multiple processing |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
26 elements (think e.g. of a feature selection step followed by a PCA projection, |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
27 then a rescaling within a fixed bounded interval), the overall transformation |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
28 from input to output data can be represented by a Theano symbolic graph. When |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
29 one wants to access the actual numeric data, a function is compiled so as to |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
30 do these computations efficiently. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
31 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
32 We discussed some specific API options for datasets and learners, which will |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
33 be added to this file in the future, but a core question that we feel should |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
34 be addressed first is how this Theano-based implementation could be achieved |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
35 exactly. For this purpose, in the following, let us assume that a dataset is |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
36 simply a matrix whose rows represent individual samples, and columns |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
37 individual features. How to handle field names, non-tensor-like data, etc. is |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
38 a very important topic that is not yet discussed in this file. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
39 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
40 A question we did not really discuss is whether datasets should be Theano |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
41 Variables. The advantage would be that they would fit directly within the |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
42 Theano framework, which may allow high level optimizations on data |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
43 transformations. However, we would lose the ability to combine Theano |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
44 expressions coded in individual datasets into a single graph. Currently, we |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
45 instead consider that a dataset has a member that is a Theano variable, and |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
46 this variable represents the data stored in the dataset. The same is done for |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
47 individual data samples. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
48 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
49 One issue with this approach is illustrated by the following example. Imagine |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
50 we want to iterate on samples in a dataset and do something with their |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
51 numeric value. We would want the code to be as close as possible to: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
52 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
53 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
54 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
55 for sample in dataset: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
56 do_something_with(sample.numeric_value()) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
57 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
58 A naive implementation of the sample API could be (assuming each sample |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
59 contains a ``variable`` member which is the variable representing this |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
60 sample's data): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
61 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
62 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
63 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
64 def numeric_value(self): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
65 if self.function is None: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
66 # Compile function to output the numeric value stored in this |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
67 # sample's variable. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
68 self.function = theano.function([], self.variable) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
69 return self.function() |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
70 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
71 However, this is not a good idea, because it would trigger a new function |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
72 compilation for each sample. Instead, we would want something like this: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
73 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
74 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
75 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
76 def numeric_value(self): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
77 if self.function_storage[0] is None: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
78 # Compile function to output the numeric value stored in this |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
79 # sample's variable. This function takes as input the index of |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
80 # the sample in the dataset, and is shared among all samples. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
81 self.function_storage[0] = theano.function( |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
82 [self.symbolic_index], self.variable) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
83 return self.function(self.numeric_index) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
84 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
85 In the code above, we assume that all samples created by the action of |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
86 iterating over the dataset share the same ``function_storage``, |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
87 ``symbolic_index`` and ``variable``: the first time we try to access the numeric |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
88 value of some sample, a function is compiled, that takes as input the index, |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
89 and outputs the variable. The only difference between samples is thus that |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
90 they are given a different numeric value for the index (``numeric_index``). |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
91 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
92 Another way to obtain the same result is to actually let the user take care of |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
93 compiling the function. It would allow the user to really control what is |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
94 being compiled, at the cost of having to write more code: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
95 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
96 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
97 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
98 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar() |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
99 get_sample = theano.function([symbolic_index], |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
100 dataset[symbolic_index].variable) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
101 for numeric_index in xrange(len(dataset)) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
102 do_something_with(get_sample(numeric_index)) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
103 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
104 Note that although the above example focused on how to iterate over a dataset, |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
105 it can be cast into a more generic problem, where some data (either dataset or |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
106 sample) is the result of some transformation applied to other data, which is |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
107 parameterized by parameters p1, p2, ..., pN (in the above example, we were |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
108 considering a sample that was obtained by taking the p1-th element in a |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
109 dataset). If we use different values for a subset Q of the parameters but keep |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
110 other parameters fixed, we would probably want to compile a single function |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
111 that takes as input all parameters in Q, while other parameters are fixed. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
112 Ideally it would be nice to let the user take control on what is being |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
113 compiled, while leaving the option of using a default sensible behavior for |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
114 those who do not want to worry about it. How to achieve this is still to be |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
115 determined. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
116 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
117 What About Learners? |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
118 -------------------- |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
119 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
120 The discussion above only mentioned datasets, but not learners. The learning |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
121 part of a learner is not a main concern (currently). What matters most w.r.t. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
122 what was discussed above is how a learner takes as input a dataset and outputs |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
123 another dataset that can be used with the dataset API. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
124 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
125 A Learner may be able to compute various things. For instance, a Neural |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
126 Network may output a ``prediction`` vector (whose elements correspond to |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
127 estimated probabilities of each class in a classification task), as well as a |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
128 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
129 and the classification error). We would want to be able to build a dataset |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
130 that contains some of these quantities computed on each sample in the input |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
131 dataset. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
132 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
133 The Neural Network code would then look something like this: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
134 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
135 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
136 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
137 class NeuralNetwork(Learner): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
138 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
139 @datalearn(..) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
140 def compute_prediction(self, sample): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
141 return softmax(theano.tensor.dot(self.weights, sample.input)) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
142 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
143 @datalearn(..) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
144 def compute_nll(self, sample): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
145 return - log(self.compute_prediction(sample)[sample.target]) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
146 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
147 @datalearn(..) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
148 def compute_penalized_nll(self, sample): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
149 return (self.compute_nll(self, sample) + |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
150 theano.tensor.sum(self.weights**2)) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
151 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
152 @datalearn(..) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
153 def compute_class_error(self, sample): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
154 probabilities = self.compute_prediction(sample) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
155 predicted_class = theano.tensor.argmax(probabilities) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
156 return predicted_class != sample.target |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
157 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
158 @datalearn(..) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
159 def compute_cost(self, sample): |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
160 return theano.tensor.concatenate([ |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
161 self.compute_penalized_nll(sample), |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
162 self.compute_nll(sample), |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
163 self.compute_class_error(sample), |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
164 ]) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
165 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
166 The ``@datalearn`` decorator would be responsible for allowing such a Learner |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
167 to be used e.g. like this: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
168 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
169 .. code-block:: python |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
170 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
171 nnet = NeuralNetwork() |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
172 predict_dataset = nnet.compute_prediction(dataset) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
173 for sample in dataset: |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
174 predict_sample = nnet.compute_prediction(sample) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
175 predict_numeric = nnet.compute_prediction({'input': numpy.zeros(10)}) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
176 multiple_fields_dataset = ConcatDataSet([ |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
177 nnet.compute_prediction(dataset), |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
178 nnet.compute_cost(dataset), |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
179 ]) |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
180 |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
181 In the code above, if one wants to obtain the numeric value of an element of |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
182 ``multiple_fields_dataset``, the Theano function being compiled would be able |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
183 to optimize computations so that the simultaneous computation of |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
184 ``prediction`` and ``cost`` is done efficiently. |
ffa2932a8cba
Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff
changeset
|
185 |