# HG changeset patch # User Olivier Delalleau # Date 1284670878 14400 # Node ID ae9c94b4ee0b0fe62b80b88ec2925827506fb522 # Parent d7192e52653e83a6c1eb5b08892e36431698c647# Parent 7c5dc11c850a2ab5f9cde3190534bf4c40c502cb Merged diff -r d7192e52653e -r ae9c94b4ee0b doc/v2_planning/api_optimization.txt --- a/doc/v2_planning/api_optimization.txt Thu Sep 16 17:00:58 2010 -0400 +++ b/doc/v2_planning/api_optimization.txt Thu Sep 16 17:01:18 2010 -0400 @@ -31,48 +31,48 @@ optimization algorithms should be global functions with a signature like 'iterative_optimizer'. -def iterative_optimizer(parameters, - cost=None, - gradients=None, - stop=None, - updates=None, - **kwargs): - """ - :param parameters: list or tuple of Theano variables - that we want to optimize iteratively. If we're minimizing f(x), then - together, these variables represent 'x'. Typically these are shared - variables and their values are the initial values for the minimization - algorithm. + def iterative_optimizer(parameters, + cost=None, + gradients=None, + stop=None, + updates=None, + **kwargs): + """ + :param parameters: list or tuple of Theano variables + that we want to optimize iteratively. If we're minimizing f(x), then + together, these variables represent 'x'. Typically these are shared + variables and their values are the initial values for the minimization + algorithm. - :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of - cost (what are the conditions on the noise?). Some algorithms might - need an exact cost, some algorithms might ignore the cost if the - gradients are given. + :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of + cost (what are the conditions on the noise?). Some algorithms might + need an exact cost, some algorithms might ignore the cost if the + gradients are given. - :param gradients: list or tuple of Theano variables representing the gradients on - the corresponding parameters. These default to tensor.grad(cost, - parameters). + :param gradients: list or tuple of Theano variables representing the gradients on + the corresponding parameters. These default to tensor.grad(cost, + parameters). - :param stop: a shared variable (scalar integer) that (if provided) will be - updated to say when the iterative minimization algorithm has finished - (1) or requires more iterations (0). + :param stop: a shared variable (scalar integer) that (if provided) will be + updated to say when the iterative minimization algorithm has finished + (1) or requires more iterations (0). - :param updates: a dictionary to update with the (var, new_value) items - associated with the iterative algorithm. The default is a new empty - dictionary. A KeyError is raised in case of key collisions. + :param updates: a dictionary to update with the (var, new_value) items + associated with the iterative algorithm. The default is a new empty + dictionary. A KeyError is raised in case of key collisions. - :param kwargs: algorithm-dependent arguments + :param kwargs: algorithm-dependent arguments - :returns: a dictionary mapping each parameter to an expression that it - should take in order to carry out the optimization procedure. + :returns: a dictionary mapping each parameter to an expression that it + should take in order to carry out the optimization procedure. - If all the parameters are shared variables, then this dictionary may be - passed as the ``updates`` argument to theano.function. + If all the parameters are shared variables, then this dictionary may be + passed as the ``updates`` argument to theano.function. - There may be more key,value pairs in the dictionary corresponding to - internal variables that are part of the optimization algorithm. + There may be more key,value pairs in the dictionary corresponding to + internal variables that are part of the optimization algorithm. - """ + """ Numpy Interface @@ -82,67 +82,82 @@ scipy's. Its arguments are numpy arrays, and functions that manipulate numpy arrays. -TODO: There is also room for an iterative object (that doesn't hog program -control) but which nonetheless works on numpy objects. Actually minimize() should -use this iterative interface under the hood. + def minimize(x0, f, df, opt_algo, **kwargs): + """ + Return a point x_new with the same type as x0 that minimizes function `f` + with derivative `df`. -def minimize(x0, f, df, opt_algo, **kwargs): - """ - Return a point x_new that minimizes function `f` with derivative `df`. + This is supposed to provide an interface similar to scipy's minimize + routines, or MATLAB's. - This is supposed to provide an interface similar to scipy's minimize - routines, or MATLAB's. + :type x0: numpy ndarray or list of numpy ndarrays. + :param x0: starting point for minimization - :type x0: numpy ndarray - :param x0: starting point for minimization + :type f: python callable mapping something like x0 to a scalar + :param f: function to minimize - :type f: python callable mapping something like x0 to a scalar - :param f: function to minimize + :type df: python callable mapping something like x0 to the derivative of f at that point + :param df: derivative of `f` - :type df: python callable mapping something like x0 to the derivative of f at that point - :param df: derivative of `f` + :param opt_algo: one of the functions that implements the + `iterative_optimizer` interface. - :param opt_algo: one of the functions that implements the - `iterative_optimizer` interface. + :param kwargs: passed through to `opt_algo` - :param kwargs: passed through to `opt_algo` + """ - """ -OD asks: Could it be more convenient for x0 to be a list? - -JB replies: Yes, but that's not the interface used by other minimize() -routines (e.g. in scipy). Maybe another list-based interface is required? +There is also a numpy-based wrapper to the iterative algorithms. +This can be more useful than minimize() because it doesn't hog program +control. Technically minimize() is probably implemented using this +minimize_iterator interface. -OD replies: I think most people would prefer to use a list-based interface, so - they don't have to manually pack / unpack multiple arrrays of parameters. So I - would vote in favor or having both (where the main reason to also provide a - non-list interface would be to allow one to easily switch e.g. to scipy's - minimize). - I would guess the reason scipy's interface is like this is because it makes - it easier for the optimization algorithm. However, this does not really - matter if we are just wrapping a theano-based algorithm (that already has - to handle multiple parameters), and avoiding useless data copies on each call - to f / df can only help speed-wise. + class minimize_iterator(object): + """ + Attributes + - x - the current best estimate of the minimum + - f - the function being minimized + - df - f's derivative function + - opt_algo - the optimization algorithm at work (a serializable, callable + object with the signature of iterative_optimizer above). -OD asks: Why make a difference between iterative and one-shot versions? A one-shot - algorithm can be seen as an iterative one that stops after its first - iteration. The difference I see between the two interfaces proposed here - is mostly that one relies on Theano while the other one does not, but - hopefully a non-Theano one can be created by simply wrapping around the - Theano one. + """ + def __init__(self, x0, f, df, opt_algo, **kwargs): + """Initialize state (arguments as in minimize()) + """ + def __iter__(self): + return self + def next(self): + """Take a step of minimization and return self raises StopIteration when + the algorithm is finished with minimization -JB replies: Right, it would make more sense to distinguish them by the fact that -one works on Theano objects, and the other on general Python callable functions. -There is room for an iterative numpy interface, but I didn't make it yet. Would -that answer your question? + """ -OD replies and asks: Partly. Do we really need a non-iterative interface? Examples -------- +Simple stochastic gradient descent could be called like this: + + sgd([p], gradients=[g], step_size=.1) + +and this would return + + {p:p-.1*g} + Simple stochastic gradient descent with extra updates: -sgd([p], gradients=[g], updates={a:b}, step_size=.1) will return {a:b, p:p-.1*g} + sgd([p], gradients=[g], updates={a:b}, step_size=.1) + +will return + + {a:b, p:p-.1*g} + + +If the parameters collide with keys in a given updates dictionary an exception +will be raised: + + sgd([p], gradients=[g], updates={p:b}, step_size=.1) + +will raise a KeyError. diff -r d7192e52653e -r ae9c94b4ee0b doc/v2_planning/optimization.txt --- a/doc/v2_planning/optimization.txt Thu Sep 16 17:00:58 2010 -0400 +++ b/doc/v2_planning/optimization.txt Thu Sep 16 17:01:18 2010 -0400 @@ -51,5 +51,44 @@ See api_optimization.txt. -OD: Do we really need a different file? If yes, maybe create a subdirectory to - be able to easily find all files related to optimization? +OD asks: Do we really need a different file? If yes, maybe create a subdirectory +to be able to easily find all files related to optimization? + +JB replies: Yoshua's orders. + + + +OD asks: Could it be more convenient for x0 to be a list? + +JB replies: Yes, but that's not the interface used by other minimize() +routines (e.g. in scipy). Maybe another list-based interface is required? + +OD replies: I think most people would prefer to use a list-based interface, so + they don't have to manually pack / unpack multiple arrrays of parameters. So I + would vote in favor or having both (where the main reason to also provide a + non-list interface would be to allow one to easily switch e.g. to scipy's + minimize). + I would guess the reason scipy's interface is like this is because it makes + it easier for the optimization algorithm. However, this does not really + matter if we are just wrapping a theano-based algorithm (that already has + to handle multiple parameters), and avoiding useless data copies on each call + to f / df can only help speed-wise. +JB replies: Done, I added possibility that x0 is list of ndarrays to the api +doc. + + + +OD asks: Why make a difference between iterative and one-shot versions? A one-shot + algorithm can be seen as an iterative one that stops after its first + iteration. The difference I see between the two interfaces proposed here + is mostly that one relies on Theano while the other one does not, but + hopefully a non-Theano one can be created by simply wrapping around the + Theano one. + +JB replies: Right, it would make more sense to distinguish them by the fact that +one works on Theano objects, and the other on general Python callable functions. +There is room for an iterative numpy interface, but I didn't make it yet. Would +that answer your question? + +OD replies and asks: Partly. Do we really need a non-iterative interface? +