annotate doc/v2_planning/API_formulas.txt @ 1428:3823dbfff6cf

add parameter to randomize the valid and test data.
author Frederic Bastien <nouiz@nouiz.org>
date Tue, 08 Feb 2011 12:57:15 -0500
parents 42ddbefd1e03
children
rev   line source
1165
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
1 .. _v2planning_formulas:
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
2
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
3 Math formulas API
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
4 =================
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
5
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
6 Why we need a formulas API
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
7 --------------------------
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
8
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
9 Their is a few reasons why having a library of mathematical formula for theano is a good reason:
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
10
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
11 * Some formula have some special thing needed for the gpu.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
12 * Sometimes we need to cast to floatX...
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
13 * Some formula have numerical stability problem.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
14 * Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
15 * If theano don't always do some stability optimization, we could do it manually in the formulas
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
16 * Some formula as complex to implement and take many try to do correctly.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
17 * Can mimic the hierarchy of other library to ease the migration to theano
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
18
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
19 Having a library help in that we solve those problem only once.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
20
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
21 What is a formula
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
22 -----------------
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
23
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
24 We define formulas as something that don't have a state. They are implemented as
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
25 python function that take theano variable as input and they output theano
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
26 variable. If you want state, look at what the others commities will do.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
27
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
28 Formulas documentation
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
29 ----------------------
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
30
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
31 We must respect what the coding commitee have set for the docstring of the file and of the function.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
32
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
33 * A latex mathematical description of the formulas(for picture representation in generated documentation)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
34 * Tags(for searching):
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
35 * a list of lower level fct used
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
36 * category(name of the submodule itself)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
37 * Tell if we did some work to make it more numerical stable. Do theano do the optimization needed?
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
38 * Tell if the grad is numericaly stable? Do theano do the optimization needed?
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
39 * Tell if work/don't/unknow on gpu.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
40 * Tell alternate name
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
41 * Tell the domaine, range of the input/output(range should use the english notation of including or excluding)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
42
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
43 Proposed hierarchy
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
44 ------------------
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
45
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
46 Here is the proposed hierarchy for formulas:
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
47
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
48 * pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error,
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
49 abs. error, various sparsity penalties (L1, Student)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
50 * pylearn.formulas.regularization: formulas for regularization
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
51 * pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
52 * pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
53 layers which could be plugged with various costs & penalties, and stacked
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
54 * pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
55 * pylearn.formulas.noise: formulas for corruption processes
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
56 * pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
57 * pylearn.formulas.trees: formulas for decision trees
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
58 * pylearn.formulas.boosting: formulas for boosting variants
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
59 * pylearn.formulas.maths for other math formulas
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
60 * pylearn.formulas.scipy.stats: example to implement the same interface as existing lib
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
61
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
62 etc.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
63
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
64 Example
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
65 -------
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
66 .. code-block:: python
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
67
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
68 """
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
69 This script defines a few often used cost functions.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
70 """
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
71 import theano
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
72 import theano.tensor as T
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
73 from tags import tags
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
74
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
75 @tags('cost','binary','cross-entropy')
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
76 def binary_crossentropy(output, target):
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
77 """ Compute the crossentropy of binary output wrt binary target.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
78
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
79 .. math::
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
80 L_{CE} \equiv t\log(o) + (1-t)\log(1-o)
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
81
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
82 :type output: Theano variable
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
83 :param output: Binary output or prediction :math:`\in[0,1]`
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
84 :type target: Theano variable
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
85 :param target: Binary target usually :math:`\in\{0,1\}`
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
86 """
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
87 return -(target * tensor.log(output) + (1.0 - target) * tensor.log(1.0 - output))
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
88
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
89
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
90 TODO
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
91 ----
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
92 * define a list of search tag to start with
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
93 * Add to the html page a list of the tag and a list of each fct associated to them.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
94 * move existing formulas to pylearn as examples and add other basics ones.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
95 * theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated.
42ddbefd1e03 made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff changeset
96