Mercurial > pylearn
annotate doc/v2_planning/API_formulas.txt @ 1207:53937045f6c7
Pasted content of email sent by Ian about existing python ML libraries
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Tue, 21 Sep 2010 10:58:14 -0400 |
parents | 42ddbefd1e03 |
children |
rev | line source |
---|---|
1165
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
1 .. _v2planning_formulas: |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
2 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
3 Math formulas API |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
4 ================= |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
5 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
6 Why we need a formulas API |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
7 -------------------------- |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
8 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
9 Their is a few reasons why having a library of mathematical formula for theano is a good reason: |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
10 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
11 * Some formula have some special thing needed for the gpu. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
12 * Sometimes we need to cast to floatX... |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
13 * Some formula have numerical stability problem. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
14 * Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
15 * If theano don't always do some stability optimization, we could do it manually in the formulas |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
16 * Some formula as complex to implement and take many try to do correctly. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
17 * Can mimic the hierarchy of other library to ease the migration to theano |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
18 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
19 Having a library help in that we solve those problem only once. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
20 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
21 What is a formula |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
22 ----------------- |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
23 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
24 We define formulas as something that don't have a state. They are implemented as |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
25 python function that take theano variable as input and they output theano |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
26 variable. If you want state, look at what the others commities will do. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
27 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
28 Formulas documentation |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
29 ---------------------- |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
30 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
31 We must respect what the coding commitee have set for the docstring of the file and of the function. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
32 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
33 * A latex mathematical description of the formulas(for picture representation in generated documentation) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
34 * Tags(for searching): |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
35 * a list of lower level fct used |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
36 * category(name of the submodule itself) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
37 * Tell if we did some work to make it more numerical stable. Do theano do the optimization needed? |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
38 * Tell if the grad is numericaly stable? Do theano do the optimization needed? |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
39 * Tell if work/don't/unknow on gpu. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
40 * Tell alternate name |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
41 * Tell the domaine, range of the input/output(range should use the english notation of including or excluding) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
42 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
43 Proposed hierarchy |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
44 ------------------ |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
45 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
46 Here is the proposed hierarchy for formulas: |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
47 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
48 * pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error, |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
49 abs. error, various sparsity penalties (L1, Student) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
50 * pylearn.formulas.regularization: formulas for regularization |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
51 * pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
52 * pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions, |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
53 layers which could be plugged with various costs & penalties, and stacked |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
54 * pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
55 * pylearn.formulas.noise: formulas for corruption processes |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
56 * pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
57 * pylearn.formulas.trees: formulas for decision trees |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
58 * pylearn.formulas.boosting: formulas for boosting variants |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
59 * pylearn.formulas.maths for other math formulas |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
60 * pylearn.formulas.scipy.stats: example to implement the same interface as existing lib |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
61 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
62 etc. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
63 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
64 Example |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
65 ------- |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
66 .. code-block:: python |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
67 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
68 """ |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
69 This script defines a few often used cost functions. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
70 """ |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
71 import theano |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
72 import theano.tensor as T |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
73 from tags import tags |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
74 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
75 @tags('cost','binary','cross-entropy') |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
76 def binary_crossentropy(output, target): |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
77 """ Compute the crossentropy of binary output wrt binary target. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
78 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
79 .. math:: |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
80 L_{CE} \equiv t\log(o) + (1-t)\log(1-o) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
81 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
82 :type output: Theano variable |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
83 :param output: Binary output or prediction :math:`\in[0,1]` |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
84 :type target: Theano variable |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
85 :param target: Binary target usually :math:`\in\{0,1\}` |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
86 """ |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
87 return -(target * tensor.log(output) + (1.0 - target) * tensor.log(1.0 - output)) |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
88 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
89 |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
90 TODO |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
91 ---- |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
92 * define a list of search tag to start with |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
93 * Add to the html page a list of the tag and a list of each fct associated to them. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
94 * move existing formulas to pylearn as examples and add other basics ones. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
95 * theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated. |
42ddbefd1e03
made the API_formulas.txt and removed duplicate stuff from the formulas.txt file
Frederic Bastien <nouiz@nouiz.org>
parents:
diff
changeset
|
96 |