# HG changeset patch
# User Olivier Delalleau <delallea@iro>
# Date 1284670878 14400
# Node ID ae9c94b4ee0b0fe62b80b88ec2925827506fb522
# Parent  d7192e52653e83a6c1eb5b08892e36431698c647# Parent  7c5dc11c850a2ab5f9cde3190534bf4c40c502cb
Merged

diff -r d7192e52653e -r ae9c94b4ee0b doc/v2_planning/api_optimization.txt
--- a/doc/v2_planning/api_optimization.txt	Thu Sep 16 17:00:58 2010 -0400
+++ b/doc/v2_planning/api_optimization.txt	Thu Sep 16 17:01:18 2010 -0400
@@ -31,48 +31,48 @@
 optimization algorithms should be global functions with a signature like
 'iterative_optimizer'.
 
-def iterative_optimizer(parameters, 
-        cost=None,
-        gradients=None,
-        stop=None, 
-        updates=None,
-        **kwargs):
-    """
-    :param parameters: list or tuple of Theano variables 
-        that we want to optimize iteratively.  If we're minimizing f(x), then
-        together, these variables represent 'x'.  Typically these are shared
-        variables and their values are the initial values for the minimization
-        algorithm.
+    def iterative_optimizer(parameters, 
+            cost=None,
+            gradients=None,
+            stop=None, 
+            updates=None,
+            **kwargs):
+        """
+        :param parameters: list or tuple of Theano variables 
+            that we want to optimize iteratively.  If we're minimizing f(x), then
+            together, these variables represent 'x'.  Typically these are shared
+            variables and their values are the initial values for the minimization
+            algorithm.
 
-    :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of
-        cost  (what are the conditions on the noise?).  Some algorithms might
-        need an exact cost, some algorithms might ignore the cost if the
-        gradients are given.
+        :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of
+            cost  (what are the conditions on the noise?).  Some algorithms might
+            need an exact cost, some algorithms might ignore the cost if the
+            gradients are given.
 
-    :param gradients: list or tuple of Theano variables representing the gradients on
-        the corresponding parameters.  These default to tensor.grad(cost,
-        parameters).
+        :param gradients: list or tuple of Theano variables representing the gradients on
+            the corresponding parameters.  These default to tensor.grad(cost,
+            parameters).
 
-    :param stop: a shared variable (scalar integer) that (if provided) will be
-        updated to say when the iterative minimization algorithm has finished
-        (1) or requires more iterations (0).
+        :param stop: a shared variable (scalar integer) that (if provided) will be
+            updated to say when the iterative minimization algorithm has finished
+            (1) or requires more iterations (0).
 
-    :param updates: a dictionary to update with the (var, new_value) items
-        associated with the iterative algorithm.  The default is a new empty
-        dictionary.  A KeyError is raised in case of key collisions.
+        :param updates: a dictionary to update with the (var, new_value) items
+            associated with the iterative algorithm.  The default is a new empty
+            dictionary.  A KeyError is raised in case of key collisions.
 
-    :param kwargs: algorithm-dependent arguments
+        :param kwargs: algorithm-dependent arguments
 
-    :returns: a dictionary mapping each parameter to an expression that it
-       should take in order to carry out the optimization procedure.
+        :returns: a dictionary mapping each parameter to an expression that it
+           should take in order to carry out the optimization procedure.
 
-       If all the parameters are shared variables, then this dictionary may be
-       passed as the ``updates`` argument to theano.function.
+           If all the parameters are shared variables, then this dictionary may be
+           passed as the ``updates`` argument to theano.function.
 
-       There may be more key,value pairs in the dictionary corresponding to
-       internal variables that are part of the optimization algorithm.
+           There may be more key,value pairs in the dictionary corresponding to
+           internal variables that are part of the optimization algorithm.
 
-    """
+        """
 
 
 Numpy Interface
@@ -82,67 +82,82 @@
 scipy's.  Its arguments are numpy arrays, and functions that manipulate numpy
 arrays.
 
-TODO: There is also room for an iterative object (that doesn't hog program
-control) but which nonetheless works on numpy objects.  Actually minimize() should
-use this iterative interface under the hood.
+    def minimize(x0, f, df, opt_algo, **kwargs):
+        """
+        Return a point x_new with the same type as x0 that minimizes function `f`
+        with derivative `df`.
 
-def minimize(x0, f, df, opt_algo, **kwargs):
-    """
-    Return a point x_new that minimizes function `f` with derivative `df`.
+        This is supposed to provide an interface similar to scipy's minimize
+        routines, or MATLAB's.
 
-    This is supposed to provide an interface similar to scipy's minimize
-    routines, or MATLAB's.
+        :type x0: numpy ndarray or list of numpy ndarrays.
+        :param x0: starting point for minimization
 
-    :type x0: numpy ndarray
-    :param x0: starting point for minimization
+        :type f: python callable mapping something like x0 to a scalar
+        :param f: function to minimize
 
-    :type f: python callable mapping something like x0 to a scalar
-    :param f: function to minimize
+        :type df: python callable mapping something like x0 to the derivative of f at that point
+        :param df: derivative of `f`
 
-    :type df: python callable mapping something like x0 to the derivative of f at that point
-    :param df: derivative of `f`
+        :param opt_algo: one of the functions that implements the
+        `iterative_optimizer` interface.
 
-    :param opt_algo: one of the functions that implements the
-    `iterative_optimizer` interface.
+        :param kwargs: passed through to `opt_algo`
 
-    :param kwargs: passed through to `opt_algo`
+        """
 
-    """
 
-OD asks: Could it be more convenient for x0 to be a list?
- 
-JB replies: Yes, but that's not the interface used by other minimize()
-routines (e.g. in scipy).  Maybe another list-based interface is required?
+There is also a numpy-based wrapper to the iterative algorithms.
+This can be more useful than minimize() because it doesn't hog program
+control.  Technically minimize() is probably implemented using this
+minimize_iterator interface.
 
-OD replies: I think most people would prefer to use a list-based interface, so
-    they don't have to manually pack / unpack multiple arrrays of parameters. So I
-    would vote in favor or having both (where the main reason to also provide a
-    non-list interface would be to allow one to easily switch e.g. to scipy's
-    minimize). 
-    I would guess the reason scipy's interface is like this is because it makes
-    it easier for the optimization algorithm. However, this does not really
-    matter if we are just wrapping a theano-based algorithm (that already has
-    to handle multiple parameters), and avoiding useless data copies on each call
-    to f / df can only help speed-wise.
+    class minimize_iterator(object):
+        """
+        Attributes
+         - x  - the current best estimate of the minimum
+         - f  - the function being minimized
+         - df - f's derivative function
+         - opt_algo - the optimization algorithm at work (a serializable, callable
+           object with the signature of iterative_optimizer above).
 
-OD asks: Why make a difference between iterative and one-shot versions? A one-shot
-    algorithm can be seen as an iterative one that stops after its first
-    iteration. The difference I see between the two interfaces proposed here
-    is mostly that one relies on Theano while the other one does not, but
-    hopefully a non-Theano one can be created by simply wrapping around the
-    Theano one.
+        """
+        def __init__(self, x0, f, df, opt_algo, **kwargs):
+            """Initialize state (arguments as in minimize())
+            """
+        def __iter__(self): 
+            return self
+        def next(self):
+            """Take a step of minimization and return self raises StopIteration when
+            the algorithm is finished with minimization
 
-JB replies: Right, it would make more sense to distinguish them by the fact that
-one works on Theano objects, and the other on general Python callable functions.
-There is room for an iterative numpy interface, but I didn't make it yet.  Would
-that answer your question?
+            """
 
-OD replies and asks: Partly. Do we really need a non-iterative interface?
 
 Examples
 --------
 
+Simple stochastic gradient descent could be called like this:
+
+    sgd([p], gradients=[g], step_size=.1) 
+
+and this would return
+
+    {p:p-.1*g}
+
 
 Simple stochastic gradient descent with extra updates:
 
-sgd([p], gradients=[g], updates={a:b}, step_size=.1) will return {a:b, p:p-.1*g}
+    sgd([p], gradients=[g], updates={a:b}, step_size=.1) 
+
+will return 
+
+    {a:b, p:p-.1*g}
+
+
+If the parameters collide with keys in a given updates dictionary an exception
+will be raised:
+
+    sgd([p], gradients=[g], updates={p:b}, step_size=.1) 
+    
+will raise a KeyError.
diff -r d7192e52653e -r ae9c94b4ee0b doc/v2_planning/optimization.txt
--- a/doc/v2_planning/optimization.txt	Thu Sep 16 17:00:58 2010 -0400
+++ b/doc/v2_planning/optimization.txt	Thu Sep 16 17:01:18 2010 -0400
@@ -51,5 +51,44 @@
 
 See api_optimization.txt.
 
-OD: Do we really need a different file? If yes, maybe create a subdirectory to
-    be able to easily find all files related to optimization?
+OD asks: Do we really need a different file? If yes, maybe create a subdirectory
+to be able to easily find all files related to optimization?
+
+JB replies: Yoshua's orders.
+
+
+
+OD asks: Could it be more convenient for x0 to be a list?
+ 
+JB replies: Yes, but that's not the interface used by other minimize()
+routines (e.g. in scipy).  Maybe another list-based interface is required?
+
+OD replies: I think most people would prefer to use a list-based interface, so
+    they don't have to manually pack / unpack multiple arrrays of parameters. So I
+    would vote in favor or having both (where the main reason to also provide a
+    non-list interface would be to allow one to easily switch e.g. to scipy's
+    minimize). 
+    I would guess the reason scipy's interface is like this is because it makes
+    it easier for the optimization algorithm. However, this does not really
+    matter if we are just wrapping a theano-based algorithm (that already has
+    to handle multiple parameters), and avoiding useless data copies on each call
+    to f / df can only help speed-wise.
+JB replies: Done, I added possibility that x0 is list of ndarrays to the api
+doc.
+
+
+
+OD asks: Why make a difference between iterative and one-shot versions? A one-shot
+    algorithm can be seen as an iterative one that stops after its first
+    iteration. The difference I see between the two interfaces proposed here
+    is mostly that one relies on Theano while the other one does not, but
+    hopefully a non-Theano one can be created by simply wrapping around the
+    Theano one.
+
+JB replies: Right, it would make more sense to distinguish them by the fact that
+one works on Theano objects, and the other on general Python callable functions.
+There is room for an iterative numpy interface, but I didn't make it yet.  Would
+that answer your question?
+
+OD replies and asks: Partly. Do we really need a non-iterative interface?
+