annotate doc/v2_planning/datalearn.txt @ 1357:ffa2932a8cba

Added datalearn committee discussion file
author Olivier Delalleau <delallea@iro>
date Thu, 11 Nov 2010 16:34:38 -0500
parents
children 5db730bb0e8e
rev   line source
1357
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
1 DataLearn: How to plug Datasets & Learner together?
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
2 ===================================================
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
3
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
4 Participants
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
5 ------------
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
6 - Yoshua
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
7 - Razvan
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
8 - Olivier D [leader?]
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
9
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
10 High-Level Objectives
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
11 ---------------------
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
12
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
13 * Simple ML experiments should be simple to write
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
14 * More complex / advanced scenarios should be possible without being forced
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
15 to work "outside" of this framework
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
16 * Computations should be optimized whenever possible
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
17 * Existing code (in any language) should be "wrappable" within this
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
18 framework
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
19 * It should be possible to replace [parts of] this framework with C++ code
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
20
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
21 Theano-Like Data Flow
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
22 ---------------------
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
23
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
24 We want to rely on Theano to be able to take advantage of its efficient
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
25 computations. The general idea is that if we chain multiple processing
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
26 elements (think e.g. of a feature selection step followed by a PCA projection,
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
27 then a rescaling within a fixed bounded interval), the overall transformation
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
28 from input to output data can be represented by a Theano symbolic graph. When
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
29 one wants to access the actual numeric data, a function is compiled so as to
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
30 do these computations efficiently.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
31
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
32 We discussed some specific API options for datasets and learners, which will
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
33 be added to this file in the future, but a core question that we feel should
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
34 be addressed first is how this Theano-based implementation could be achieved
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
35 exactly. For this purpose, in the following, let us assume that a dataset is
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
36 simply a matrix whose rows represent individual samples, and columns
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
37 individual features. How to handle field names, non-tensor-like data, etc. is
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
38 a very important topic that is not yet discussed in this file.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
39
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
40 A question we did not really discuss is whether datasets should be Theano
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
41 Variables. The advantage would be that they would fit directly within the
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
42 Theano framework, which may allow high level optimizations on data
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
43 transformations. However, we would lose the ability to combine Theano
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
44 expressions coded in individual datasets into a single graph. Currently, we
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
45 instead consider that a dataset has a member that is a Theano variable, and
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
46 this variable represents the data stored in the dataset. The same is done for
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
47 individual data samples.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
48
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
49 One issue with this approach is illustrated by the following example. Imagine
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
50 we want to iterate on samples in a dataset and do something with their
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
51 numeric value. We would want the code to be as close as possible to:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
52
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
53 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
54
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
55 for sample in dataset:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
56 do_something_with(sample.numeric_value())
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
57
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
58 A naive implementation of the sample API could be (assuming each sample
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
59 contains a ``variable`` member which is the variable representing this
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
60 sample's data):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
61
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
62 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
63
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
64 def numeric_value(self):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
65 if self.function is None:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
66 # Compile function to output the numeric value stored in this
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
67 # sample's variable.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
68 self.function = theano.function([], self.variable)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
69 return self.function()
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
70
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
71 However, this is not a good idea, because it would trigger a new function
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
72 compilation for each sample. Instead, we would want something like this:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
73
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
74 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
75
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
76 def numeric_value(self):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
77 if self.function_storage[0] is None:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
78 # Compile function to output the numeric value stored in this
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
79 # sample's variable. This function takes as input the index of
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
80 # the sample in the dataset, and is shared among all samples.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
81 self.function_storage[0] = theano.function(
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
82 [self.symbolic_index], self.variable)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
83 return self.function(self.numeric_index)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
84
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
85 In the code above, we assume that all samples created by the action of
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
86 iterating over the dataset share the same ``function_storage``,
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
87 ``symbolic_index`` and ``variable``: the first time we try to access the numeric
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
88 value of some sample, a function is compiled, that takes as input the index,
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
89 and outputs the variable. The only difference between samples is thus that
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
90 they are given a different numeric value for the index (``numeric_index``).
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
91
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
92 Another way to obtain the same result is to actually let the user take care of
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
93 compiling the function. It would allow the user to really control what is
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
94 being compiled, at the cost of having to write more code:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
95
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
96 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
97
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
98 symbolic_index = dataset.get_index() # Or just theano.tensor.iscalar()
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
99 get_sample = theano.function([symbolic_index],
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
100 dataset[symbolic_index].variable)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
101 for numeric_index in xrange(len(dataset))
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
102 do_something_with(get_sample(numeric_index))
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
103
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
104 Note that although the above example focused on how to iterate over a dataset,
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
105 it can be cast into a more generic problem, where some data (either dataset or
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
106 sample) is the result of some transformation applied to other data, which is
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
107 parameterized by parameters p1, p2, ..., pN (in the above example, we were
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
108 considering a sample that was obtained by taking the p1-th element in a
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
109 dataset). If we use different values for a subset Q of the parameters but keep
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
110 other parameters fixed, we would probably want to compile a single function
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
111 that takes as input all parameters in Q, while other parameters are fixed.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
112 Ideally it would be nice to let the user take control on what is being
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
113 compiled, while leaving the option of using a default sensible behavior for
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
114 those who do not want to worry about it. How to achieve this is still to be
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
115 determined.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
116
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
117 What About Learners?
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
118 --------------------
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
119
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
120 The discussion above only mentioned datasets, but not learners. The learning
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
121 part of a learner is not a main concern (currently). What matters most w.r.t.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
122 what was discussed above is how a learner takes as input a dataset and outputs
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
123 another dataset that can be used with the dataset API.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
124
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
125 A Learner may be able to compute various things. For instance, a Neural
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
126 Network may output a ``prediction`` vector (whose elements correspond to
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
127 estimated probabilities of each class in a classification task), as well as a
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
128 ``cost`` vector (whose elements correspond to the penalized NLL, the NLL alone
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
129 and the classification error). We would want to be able to build a dataset
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
130 that contains some of these quantities computed on each sample in the input
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
131 dataset.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
132
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
133 The Neural Network code would then look something like this:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
134
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
135 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
136
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
137 class NeuralNetwork(Learner):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
138
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
139 @datalearn(..)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
140 def compute_prediction(self, sample):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
141 return softmax(theano.tensor.dot(self.weights, sample.input))
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
142
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
143 @datalearn(..)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
144 def compute_nll(self, sample):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
145 return - log(self.compute_prediction(sample)[sample.target])
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
146
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
147 @datalearn(..)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
148 def compute_penalized_nll(self, sample):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
149 return (self.compute_nll(self, sample) +
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
150 theano.tensor.sum(self.weights**2))
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
151
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
152 @datalearn(..)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
153 def compute_class_error(self, sample):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
154 probabilities = self.compute_prediction(sample)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
155 predicted_class = theano.tensor.argmax(probabilities)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
156 return predicted_class != sample.target
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
157
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
158 @datalearn(..)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
159 def compute_cost(self, sample):
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
160 return theano.tensor.concatenate([
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
161 self.compute_penalized_nll(sample),
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
162 self.compute_nll(sample),
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
163 self.compute_class_error(sample),
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
164 ])
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
165
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
166 The ``@datalearn`` decorator would be responsible for allowing such a Learner
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
167 to be used e.g. like this:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
168
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
169 .. code-block:: python
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
170
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
171 nnet = NeuralNetwork()
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
172 predict_dataset = nnet.compute_prediction(dataset)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
173 for sample in dataset:
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
174 predict_sample = nnet.compute_prediction(sample)
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
175 predict_numeric = nnet.compute_prediction({'input': numpy.zeros(10)})
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
176 multiple_fields_dataset = ConcatDataSet([
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
177 nnet.compute_prediction(dataset),
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
178 nnet.compute_cost(dataset),
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
179 ])
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
180
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
181 In the code above, if one wants to obtain the numeric value of an element of
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
182 ``multiple_fields_dataset``, the Theano function being compiled would be able
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
183 to optimize computations so that the simultaneous computation of
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
184 ``prediction`` and ``cost`` is done efficiently.
ffa2932a8cba Added datalearn committee discussion file
Olivier Delalleau <delallea@iro>
parents:
diff changeset
185