diff doc/v2_planning/API_formulas.txt @ 1174:fe6c25eb1e37

merge
author pascanur
date Fri, 17 Sep 2010 16:13:58 -0400
parents 42ddbefd1e03
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/API_formulas.txt	Fri Sep 17 16:13:58 2010 -0400
@@ -0,0 +1,96 @@
+.. _v2planning_formulas:
+
+Math formulas API
+=================
+
+Why we need a formulas API
+--------------------------
+
+Their is a few reasons why having a library of mathematical formula for theano is a good reason:
+
+* Some formula have some special thing needed for the gpu. 
+   * Sometimes we need to cast to floatX...
+* Some formula have numerical stability problem.
+* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones)
+   * If theano don't always do some stability optimization, we could do it manually in the formulas
+* Some formula as complex to implement and take many try to do correctly. 
+* Can mimic the hierarchy of other library to ease the migration to theano
+
+Having a library help in that we solve those problem only once.
+
+What is a formula
+-----------------
+
+We define formulas as something that don't have a state. They are implemented as
+python function that take theano variable as input and they output theano 
+variable. If you want state, look at what the others commities will do.
+
+Formulas documentation
+----------------------
+
+We must respect what the coding commitee have set for the docstring of the file and of the function. 
+
+* A latex mathematical description of the formulas(for picture representation in generated documentation)
+* Tags(for searching):
+   * a list of lower level fct used
+   * category(name of the submodule itself)
+* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed?
+* Tell if the grad is numericaly stable? Do theano do the optimization needed?
+* Tell if work/don't/unknow on gpu.
+* Tell alternate name
+* Tell the domaine, range of the input/output(range should use the english notation of including or excluding)
+
+Proposed hierarchy
+------------------
+
+Here is the proposed hierarchy for formulas:
+
+* pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error, 
+  abs. error, various sparsity penalties (L1, Student)
+* pylearn.formulas.regularization: formulas for regularization
+* pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA
+* pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
+  layers which could be plugged with various costs & penalties, and stacked
+* pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants
+* pylearn.formulas.noise: formulas for corruption processes
+* pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling
+* pylearn.formulas.trees: formulas for decision trees
+* pylearn.formulas.boosting: formulas for boosting variants
+* pylearn.formulas.maths for other math formulas
+* pylearn.formulas.scipy.stats: example to implement the same interface as existing lib
+
+etc.
+
+Example
+-------
+.. code-block:: python
+
+        """
+        This script defines a few often used cost functions.
+        """
+        import theano
+        import theano.tensor as T
+        from tags import tags
+
+        @tags('cost','binary','cross-entropy')
+        def binary_crossentropy(output, target):
+            """ Compute the crossentropy of binary output wrt binary target.
+
+            .. math::
+                L_{CE} \equiv t\log(o) + (1-t)\log(1-o) 
+
+            :type output: Theano variable
+            :param output: Binary output or prediction :math:`\in[0,1]`
+            :type target: Theano variable
+            :param target: Binary target usually :math:`\in\{0,1\}`
+            """
+            return -(target * tensor.log(output) + (1.0 - target) * tensor.log(1.0 - output))
+
+
+TODO 
+----
+* define a list of search tag to start with
+* Add to the html page a list of the tag and a list of each fct associated to them.
+* move existing formulas to pylearn as examples and add other basics ones.
+* theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated.
+