Mercurial > pylearn

--- a/.hgignore	Fri Sep 17 14:37:00 2010 -0400
+++ b/.hgignore	Fri Sep 17 14:37:08 2010 -0400
@@ -2,5 +2,6 @@
 *~
 *.swp
 *.pyc
+*.orig
 core.*
 html
\ No newline at end of file
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/API_formulas.txt	Fri Sep 17 14:37:08 2010 -0400
@@ -0,0 +1,96 @@
+.. _v2planning_formulas:
+
+Math formulas API
+=================
+
+Why we need a formulas API
+--------------------------
+
+Their is a few reasons why having a library of mathematical formula for theano is a good reason:
+
+* Some formula have some special thing needed for the gpu.
+   * Sometimes we need to cast to floatX...
+* Some formula have numerical stability problem.
+* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones)
+   * If theano don't always do some stability optimization, we could do it manually in the formulas
+* Some formula as complex to implement and take many try to do correctly.
+* Can mimic the hierarchy of other library to ease the migration to theano
+
+Having a library help in that we solve those problem only once.
+
+What is a formula
+-----------------
+
+We define formulas as something that don't have a state. They are implemented as
+python function that take theano variable as input and they output theano
+variable. If you want state, look at what the others commities will do.
+
+Formulas documentation
+----------------------
+
+We must respect what the coding commitee have set for the docstring of the file and of the function.
+
+* A latex mathematical description of the formulas(for picture representation in generated documentation)
+* Tags(for searching):
+   * a list of lower level fct used
+   * category(name of the submodule itself)
+* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed?
+* Tell if the grad is numericaly stable? Do theano do the optimization needed?
+* Tell if work/don't/unknow on gpu.
+* Tell alternate name
+* Tell the domaine, range of the input/output(range should use the english notation of including or excluding)
+
+Proposed hierarchy
+------------------
+
+Here is the proposed hierarchy for formulas:
+
+* pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error,
+  abs. error, various sparsity penalties (L1, Student)
+* pylearn.formulas.regularization: formulas for regularization
+* pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA
+* pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
+  layers which could be plugged with various costs & penalties, and stacked
+* pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants
+* pylearn.formulas.noise: formulas for corruption processes
+* pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling
+* pylearn.formulas.trees: formulas for decision trees
+* pylearn.formulas.boosting: formulas for boosting variants
+* pylearn.formulas.maths for other math formulas
+* pylearn.formulas.scipy.stats: example to implement the same interface as existing lib
+
+etc.
+
+Example
+-------
+.. code-block:: python
+
+        """
+        This script defines a few often used cost functions.
+        """
+        import theano
+        import theano.tensor as T
+        from tags import tags
+
+        @tags('cost','binary','cross-entropy')
+        def binary_crossentropy(output, target):
+            """ Compute the crossentropy of binary output wrt binary target.
+
+            .. math::
+                L_{CE} \equiv t\log(o) + (1-t)\log(1-o)
+
+            :type output: Theano variable
+            :param output: Binary output or prediction :math:`\in[0,1]`
+            :type target: Theano variable
+            :param target: Binary target usually :math:`\in\{0,1\}`
+            """
+            return -(target * tensor.log(output) + (1.0 - target) * tensor.log(1.0 - output))
+
+
+TODO
+----
+* define a list of search tag to start with
+* Add to the html page a list of the tag and a list of each fct associated to them.
+* move existing formulas to pylearn as examples and add other basics ones.
+* theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated.
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/API_learner.txt	Fri Sep 17 14:37:08 2010 -0400
@@ -0,0 +1,95 @@
+# A list of "task types"
+
+'''
+ List of tasks types:
+  Attributes
+
+   sequential
+   spatial
+   structured
+   semi-supervised
+   missing-values
+
+
+  Supervised (x,y)
+
+   classification
+   regression
+   probabilistic classification
+   ranking
+   conditional density estimation
+   collaborative filtering
+   ordinal regression ?= ranking
+
+  Unsupervised (x)
+
+   de-noising
+   feature learning ( transformation ) PCA, DAA
+   density estimation
+   inference
+
+  Other
+
+   generation (sampling)
+   structure learning ???
+
+
+Notes on metrics & statistics:
+   - some are applied to an example, others on a batch
+   - most statistics are on the dataset
+'''
+
+
+class Learner(Object):
+    '''
+    Takes data as inputs, and learns a prediction function (or several).
+
+    A learner is parametrized by hyper-parameters, which can be set from the
+    outside (a "client" from Learner, that can be a HyperLearner, a
+    Tester,...).
+
+    The data can be given all at a time as a data set, or incrementally.
+    Some learner need to be fully trained in one step, whereas other can be
+    trained incrementally.
+
+    The question of statistics collection during training remains open.
+    '''
+    #def use_dataset(dataset)
+
+    # return a dictionary of hyperparameters names(keys)
+    # and value(values)
+    def get_hyper_parameters()
+    def set_hyper_parameters(dictionary)
+
+
+
+
+    # Ver B
+    def eval(dataset)
+    def predict(dataset)
+
+    # Trainable
+    def train(dataset)   # train until complition
+
+    # Incremental
+    def use_dataset(dataset)
+    def adapt(n_steps =1)
+    def has_converged()
+
+    #
+
+
+# Some example cases
+
+class HyperLearner(Learner):
+
+    ### def get_hyper_parameter_distribution(name)
+    def set_hyper_parameters_distribution(dictionary)
+
+
+def bagging(learner_factory):
+    for i in range(N):
+        learner_i = learner_factory.new()
+        # todo: get dataset_i ??
+        learner_i.use_dataset(dataset_i)
+        learner_i.train()
--- a/doc/v2_planning/formulas.txt	Fri Sep 17 14:37:00 2010 -0400
+++ b/doc/v2_planning/formulas.txt	Fri Sep 17 14:37:08 2010 -0400
@@ -9,47 +9,6 @@
 - Olivier B.
 - Nicolas

-TODO
-----
-* define a list of search tag to start with
-* propose an interface(many inputs, outputs, doc style, hierrache, to search, html output?)
-* find existing repositories with files for formulas.
-* move existing formulas to pylearn as examples and add other basics ones.
-** theano.tensor.nnet will probably be copied to pylearn.formulas.nnet and depricated.
-
-Why we need formulas
---------------------
-
-Their is a few reasons why having a library of mathematical formula for theano is a good reason:
-
-* Some formula have some special thing needed for the gpu.
-   * Sometimes we need to cast to floatX...
-* Some formula have numerical stability problem.
-* Some formula gradiant have numerical stability problem. (Happen more frequently then the previous ones)
-   * If theano don't always do some stability optimization, we could do it manually in the formulas
-* Some formula as complex to implement and take many try to do correctly.
-
-Having a library help in that we solve those problem only once.
-
-Formulas definition
--------------------
-
-We define formulas as something that don't have a state. They are implemented as python function
-that take theano variable as input and output theano variable. If you want state, look at what the
-learner commity will do.
-
-Formulas doc must have
-----------------------
-
-* A latex mathematical description of the formulas(for picture representation in generated documentation)
-* Tags(for searching):
-   * a list of lower lovel fct used
-   * category(name of the submodule itself)
-* Tell if we did some work to make it more numerical stable. Do theano do the optimization needed?
-* Tell if the grad is numericaly stable? Do theano do the optimization needed?
-* Tell if work on gpu/not/unknow
-* Tell alternate name
-* Tell the domaine, range of the input/output(range should use the english notation of including or excluding)

 List of existing repos
 ----------------------
@@ -57,33 +16,3 @@
 Olivier B. ?
 Xavier G.: git@github.com:glorotxa/DeepANN.git, see file deepANN/{Activations.py(to nnet),Noise.py,Reconstruction_cost.py(to costs),Regularization.py(to regularization}

-Proposed hierarchy
-------------------
-
-Here is the proposed hierarchy for formulas
-
-pylearn.formulas.costs: generic / common cost functions, e.g. various cross-entropies, squared error,
-abs. error, various sparsity penalties (L1, Student)
-
-pylearn.formulas.regularization: formulas for regularization
-
-pylearn.formulas.linear: formulas for linear classifier, linear regression, factor analysis, PCA
-
-pylearn.formulas.nnet: formulas for building layers of various kinds, various activation functions,
-layers which could be plugged with various costs & penalties, and stacked
-
-pylearn.formulas.ae: formulas for auto-encoders and denoising auto-encoder variants
-
-pylearn.formulas.noise: formulas for corruption processes
-
-pylearn.formulas.rbm: energies, free energies, conditional distributions, Gibbs sampling
-
-pylearn.formulas.trees: formulas for decision trees
-
-pylearn.formulas.boosting: formulas for boosting variants
-
-pylearn.formulas.maths for other math formulas
-
-pylearn.formulas.scipy.stats: example to implement the same interface as existing lib
-
-etc.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/index.txt	Fri Sep 17 14:37:08 2010 -0400
@@ -0,0 +1,8 @@
+.. _libdoc:
+
+.. toctree::
+   :maxdepth: 1
+
+   API_formulas
+   API_coding_style
+   api_optimization
--- a/doc/v2_planning/learn_meeting.py	Fri Sep 17 14:37:00 2010 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,76 +0,0 @@
-
-
-def bagging(learner_factory):
-    for i in range(N):
-        learner_i = learner_factory.new()
-        # todo: get dataset_i ??
-        learner_i.use_dataset(dataset_i)
-        learner_i.train()
-'''
- List of tasks types:
-  Attributes
-
-   sequential
-   spatial
-   structured
-   semi-supervised
-   missing-values
-
-
-  Supervised (x,y)
-
-   classification
-   regression
-   probabilistic classification
-   ranking
-   conditional density estimation
-   collaborative filtering
-   ordinal regression ?= ranking
-
-  Unsupervised (x)
-
-   de-noising
-   feature learning ( transformation ) PCA, DAA
-   density estimation
-   inference
-
-  Other
-
-   generation (sampling)
-   structure learning ???
-
-
-Notes on metrics & statistics:
-   - some are applied to an example, others on a batch
-   - most statistics are on the dataset
-'''
-class Learner(Object):
-
-    #def use_dataset(dataset)
-
-    # return a dictionary of hyperparameters names(keys)
-    # and value(values)
-    def get_hyper_parameters()
-    def set_hyper_parameters(dictionary)
-
-
-
-
-    # Ver B
-    def eval(dataset)
-    def predict(dataset)
-
-    # Trainable
-    def train(dataset)   # train until complition
-
-    # Incremental
-    def use_dataset(dataset)
-    def adapt(n_steps =1)
-    def has_converged()
-
-    #
-
-class HyperLearner(Learner):
-
-    ### def get_hyper_parameter_distribution(name)
-    def set_hyper_parameters_distribution(dictionary)
--- a/doc/v2_planning/learner.txt	Fri Sep 17 14:37:00 2010 -0400
+++ b/doc/v2_planning/learner.txt	Fri Sep 17 14:37:08 2010 -0400
@@ -1,6 +1,6 @@

 Comittee: AB, PL, GM, IG, RP, NB, PV
-Leader: ?
+Leader: PL

 Discussion of Function Specification for Learner Types
 ======================================================