# HG changeset patch # User Olivier Delalleau # Date 1290107688 18000 # Node ID 0665274b14af748387022b98151ffe6171dbd712 # Parent e8fc563dad740afb04b730c824d9afb25ff75dac Minor fixes for better Sphinx doc output diff -r e8fc563dad74 -r 0665274b14af doc/v2_planning/datalearn.txt --- a/doc/v2_planning/datalearn.txt Thu Nov 18 14:00:49 2010 -0500 +++ b/doc/v2_planning/datalearn.txt Thu Nov 18 14:14:48 2010 -0500 @@ -160,12 +160,12 @@ function. In summary: -- Data (samples and datasets) are basically Theano Variables, and a data - transformation an Op. -- When writing code that requires some data numeric value, one has to compile - a Theano function to obtain it. This is done either manually or through some - helper Pylearn functions for common tasks. In both cases, the user should - have enough control to be able to obtain an efficient implementation. + - Data (samples and datasets) are basically Theano Variables, and a data + transformation an Op. + - When writing code that requires some data numeric value, one has to compile + a Theano function to obtain it. This is done either manually or through some + helper Pylearn functions for common tasks. In both cases, the user should + have enough control to be able to obtain an efficient implementation. What About Learners? @@ -196,26 +196,25 @@ # or symbolic datasets. # Other approaches than a decorator are possible (e.g. using # different function names). - @datalearn(..) def compute_prediction(self, sample): return softmax(theano.tensor.dot(self.weights, sample.input)) - @datalearn(..) + @datalearn def compute_nll(self, sample): return - log(self.compute_prediction(sample)[sample.target]) - @datalearn(..) + @datalearn def compute_penalized_nll(self, sample): return (self.compute_nll(self, sample) + theano.tensor.sum(self.weights**2)) - @datalearn(..) + @datalearn def compute_class_error(self, sample): probabilities = self.compute_prediction(sample) predicted_class = theano.tensor.argmax(probabilities) return predicted_class != sample.target - @datalearn(..) + @datalearn def compute_cost(self, sample): return theano.tensor.concatenate([ self.compute_penalized_nll(sample), @@ -254,16 +253,17 @@ The above is not yet a practical proposal. Investigation of the following topics is still missing: -- Datasets whose variables are not matrices (e.g. large datasets that do not - fit in memory, non fixed-length vector samples, ...) -- Field names. -- Typical input / target / weight split. -- Learners whose output on a dataset cannot be obtained by computing outputs - on individual samples (e.g. a Learner that ranks samples based on pair-wise - comparisons). -- Code parallelization, stop & restart. -- Modular C++ implementation without Theano. -- ... + + - Datasets whose variables are not matrices (e.g. large datasets that do not + fit in memory, non fixed-length vector samples, ...) + - Field names. + - Typical input / target / weight split. + - Learners whose output on a dataset cannot be obtained by computing outputs + on individual samples (e.g. a Learner that ranks samples based on pair-wise + comparisons). + - Code parallelization, stop & restart. + - Modular C++ implementation without Theano. + - ... Previous Introduction (deprecated) @@ -417,6 +417,7 @@ numeric function, and dataset in this case is the result of some computations on a initial dataset. I would differentiate the two approaches (1) and (2) as : + - first of all whatever you can do with (1) you can do with (2) - approach (1) hides the fact that you are working with symbolic graphs. You apply functions to datasets, and when you want to see values a