# HG changeset patch # User Frederic Bastien # Date 1284746227 14400 # Node ID 42ddbefd1e032dc6a4f3b6b85feee7acd9b6d5b3 # Parent 53d11eafdaa9c8019bd29adb439b76cf480c966a made the API_formulas.txt and removed duplicate stuff from the formulas.txt file diff -r 53d11eafdaa9 -r 42ddbefd1e03 doc/v2_planning/API_formulas.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/API_formulas.txt Fri Sep 17 13:57:07 2010 -0400 @@ -0,0 +1,96 @@ +.. _v2planning_formulas: + +Math formulas API +================= + +Why we need a formulas API +-------------------------- + +Their is a few reasons why having a library of mathematical formula for theano is a good reason: + +* Some formula have some special thing needed for the gpu. + * Sometimes we need to cast to floatX... +* Some formula have numerical stability problem. +* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones) + * If theano don't always do some stability optimization, we could do it manually in the formulas +* Some formula as complex to implement and take many try to do correctly. +* Can mimic the hierarchy of other library to ease the migration to theano + +Having a library help in that we solve those problem only once. + +What is a formula +----------------- + +We define formulas as something that don't have a state. They are implemented as +python function that take theano variable as input and they output theano +variable. If you want state, look at what the others commities will do. + +Formulas documentation +---------------------- + +We must respect what the coding commitee have set for the docstring of the file and of the function. + +* A latex mathematical description of the formulas(for picture representation in generated documentation) +* Tags(for searching): + * a list of lower level fct used + * category(name of the submodule itself) +* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed? +* Tell if the grad is numericaly stable? Do theano do the optimization needed? +* Tell if work/don't/unknow on gpu. +* Tell alternate name +* Tell the domaine, range of the input/output(range should use the english notation of including or excluding) + +Proposed hierarchy +------------------ + +Here is the proposed hierarchy for formulas: + +* pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error, + abs. error, various sparsity penalties (L1, Student) +* pylearn.formulas.regularization: formulas for regularization +* pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA +* pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions, + layers which could be plugged with various costs & penalties, and stacked +* pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants +* pylearn.formulas.noise: formulas for corruption processes +* pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling +* pylearn.formulas.trees: formulas for decision trees +* pylearn.formulas.boosting: formulas for boosting variants +* pylearn.formulas.maths for other math formulas +* pylearn.formulas.scipy.stats: example to implement the same interface as existing lib + +etc. + +Example +------- +.. code-block:: python + + """ + This script defines a few often used cost functions. + """ + import theano + import theano.tensor as T + from tags import tags + + @tags('cost','binary','cross-entropy') + def binary_crossentropy(output, target): + """ Compute the crossentropy of binary output wrt binary target. + + .. math:: + L_{CE} \equiv t\log(o) + (1-t)\log(1-o) + + :type output: Theano variable + :param output: Binary output or prediction :math:`\in[0,1]` + :type target: Theano variable + :param target: Binary target usually :math:`\in\{0,1\}` + """ + return -(target * tensor.log(output) + (1.0 - target) * tensor.log(1.0 - output)) + + +TODO +---- +* define a list of search tag to start with +* Add to the html page a list of the tag and a list of each fct associated to them. +* move existing formulas to pylearn as examples and add other basics ones. +* theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated. + diff -r 53d11eafdaa9 -r 42ddbefd1e03 doc/v2_planning/formulas.txt --- a/doc/v2_planning/formulas.txt Fri Sep 17 13:56:22 2010 -0400 +++ b/doc/v2_planning/formulas.txt Fri Sep 17 13:57:07 2010 -0400 @@ -9,47 +9,6 @@ - Olivier B. - Nicolas -TODO ----- -* define a list of search tag to start with -* propose an interface(many inputs, outputs, doc style, hierrache, to search, html output?) -* find existing repositories with files for formulas. -* move existing formulas to pylearn as examples and add other basics ones. -** theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated. - -Why we need formulas --------------------- - -Their is a few reasons why having a library of mathematical formula for theano is a good reason: - -* Some formula have some special thing needed for the gpu. - * Sometimes we need to cast to floatX... -* Some formula have numerical stability problem. -* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones) - * If theano don't always do some stability optimization, we could do it manually in the formulas -* Some formula as complex to implement and take many try to do correctly. - -Having a library help in that we solve those problem only once. - -Formulas definition -------------------- - -We define formulas as something that don't have a state. They are implemented as python function -that take theano variable as input and output theano variable. If you want state, look at what the -learner commity will do. - -Formulas doc must have ----------------------- - -* A latex mathematical description of the formulas(for picture representation in generated documentation) -* Tags(for searching): - * a list of lower lovel fct used - * category(name of the submodule itself) -* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed? -* Tell if the grad is numericaly stable? Do theano do the optimization needed? -* Tell if work on gpu/not/unknow -* Tell alternate name -* Tell the domaine, range of the input/output(range should use the english notation of including or excluding) List of existing repos ---------------------- @@ -57,33 +16,3 @@ Olivier B. ? Xavier G.: git@github.com:glorotxa/DeepANN.git, see file deepANN/{Activations.py(to nnet),Noise.py,Reconstruction_cost.py(to costs),Regularization.py(to regularization} -Proposed hierarchy ------------------- - -Here is the proposed hierarchy for formulas - -pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error, -abs. error, various sparsity penalties (L1, Student) - -pylearn.formulas.regularization: formulas for regularization - -pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA - -pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions, -layers which could be plugged with various costs & penalties, and stacked - -pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants - -pylearn.formulas.noise: formulas for corruption processes - -pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling - -pylearn.formulas.trees: formulas for decision trees - -pylearn.formulas.boosting: formulas for boosting variants - -pylearn.formulas.maths for other math formulas - -pylearn.formulas.scipy.stats: example to implement the same interface as existing lib - -etc.