view doc/v2_planning/formulas.txt @ 1104:5e6d7d9e803a

a comment on the GPU issue for datasets
author Razvan Pascanu <r.pascanu@gmail.com>
date Mon, 13 Sep 2010 20:21:23 -0400
parents bc246542d6ff
children 42ddbefd1e03
line wrap: on
line source

Math formulas
=============

Participants
------------
- Fred*
- Razvan
- Aaron
- Olivier B.
- Nicolas

TODO 
----
* define a list of search tag to start with
* propose an interface(many inputs, outputs, doc style, hierrache, to search, html output?)
* find existing repositories with files for formulas.
* move existing formulas to pylearn as examples and add other basics ones.
** theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated.

Why we need formulas
--------------------

Their is a few reasons why having a library of mathematical formula for theano is a good reason:

* Some formula have some special thing needed for the gpu. 
   * Sometimes we need to cast to floatX...
* Some formula have numerical stability problem.
* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones)
   * If theano don't always do some stability optimization, we could do it manually in the formulas
* Some formula as complex to implement and take many try to do correctly. 

Having a library help in that we solve those problem only once.

Formulas definition
-------------------

We define formulas as something that don't have a state. They are implemented as python function 
that take theano variable as input and output theano variable. If you want state, look at what the 
learner commity will do.

Formulas doc must have
----------------------

* A latex mathematical description of the formulas(for picture representation in generated documentation)
* Tags(for searching):
   * a list of lower lovel fct used
   * category(name of the submodule itself)
* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed?
* Tell if the grad is numericaly stable? Do theano do the optimization needed?
* Tell if work on gpu/not/unknow
* Tell alternate name
* Tell the domaine, range of the input/output(range should use the english notation of including or excluding)

List of existing repos
----------------------

Olivier B. ?
Xavier G.: git@github.com:glorotxa/DeepANN.git, see file deepANN/{Activations.py(to nnet),Noise.py,Reconstruction_cost.py(to costs),Regularization.py(to regularization}

Proposed hierarchy
------------------

Here is the proposed hierarchy for formulas

pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error, 
abs. error, various sparsity penalties (L1, Student)

pylearn.formulas.regularization: formulas for regularization

pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA

pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
layers which could be plugged with various costs & penalties, and stacked

pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants

pylearn.formulas.noise: formulas for corruption processes

pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling

pylearn.formulas.trees: formulas for decision trees

pylearn.formulas.boosting: formulas for boosting variants

pylearn.formulas.maths for other math formulas

pylearn.formulas.scipy.stats: example to implement the same interface as existing lib

etc.