Mercurial > pylearn
view doc/v2_planning/api_optimization.txt @ 1142:4f03a9a743dc
Merged
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Thu, 16 Sep 2010 13:28:50 -0400 |
parents | c5c7ba805a2f |
children | 7c5dc11c850a |
line wrap: on
line source
Optimization API ================ Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes Leader: Bergstra Description ----------- This API is for iterative optimization algorithms, such as: - stochastic gradient descent (incl. momentum, annealing) - delta bar delta - conjugate methods - L-BFGS - "Hessian Free" - SGD-QN - TONGA The API includes an iterative interface based on Theano, and a one-shot interface similar to SciPy and MATLAB that is based on Python and Numpy, that only uses Theano for the implementation. Theano Interface ----------------- The theano interface to optimization algorithms is to ask for a dictionary of updates that can be used in theano.function. Implementations of iterative optimization algorithms should be global functions with a signature like 'iterative_optimizer'. def iterative_optimizer(parameters, cost=None, gradients=None, stop=None, updates=None, **kwargs): """ :param parameters: list or tuple of Theano variables that we want to optimize iteratively. If we're minimizing f(x), then together, these variables represent 'x'. Typically these are shared variables and their values are the initial values for the minimization algorithm. :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of cost (what are the conditions on the noise?). Some algorithms might need an exact cost, some algorithms might ignore the cost if the gradients are given. :param gradients: list or tuple of Theano variables representing the gradients on the corresponding parameters. These default to tensor.grad(cost, parameters). :param stop: a shared variable (scalar integer) that (if provided) will be updated to say when the iterative minimization algorithm has finished (1) or requires more iterations (0). :param updates: a dictionary to update with the (var, new_value) items associated with the iterative algorithm. The default is a new empty dictionary. A KeyError is raised in case of key collisions. :param kwargs: algorithm-dependent arguments :returns: a dictionary mapping each parameter to an expression that it should take in order to carry out the optimization procedure. If all the parameters are shared variables, then this dictionary may be passed as the ``updates`` argument to theano.function. There may be more key,value pairs in the dictionary corresponding to internal variables that are part of the optimization algorithm. """ Numpy Interface --------------- The numpy interface to optimization algorithms is supposed to mimick scipy's. Its arguments are numpy arrays, and functions that manipulate numpy arrays. TODO: There is also room for an iterative object (that doesn't hog program control) but which nonetheless works on numpy objects. Actually minimize() should use this iterative interface under the hood. def minimize(x0, f, df, opt_algo, **kwargs): """ Return a point x_new that minimizes function `f` with derivative `df`. This is supposed to provide an interface similar to scipy's minimize routines, or MATLAB's. :type x0: numpy ndarray :param x0: starting point for minimization :type f: python callable mapping something like x0 to a scalar :param f: function to minimize :type df: python callable mapping something like x0 to the derivative of f at that point :param df: derivative of `f` :param opt_algo: one of the functions that implements the `iterative_optimizer` interface. :param kwargs: passed through to `opt_algo` """ OD asks: Could it be more convenient for x0 to be a list? JB replies: Yes, but that's not the interface used by other minimize() routines (e.g. in scipy). Maybe another list-based interface is required? OD replies: I think most people would prefer to use a list-based interface, so they don't have to manually pack / unpack multiple arrrays of parameters. So I would vote in favor or having both (where the main reason to also provide a non-list interface would be to allow one to easily switch e.g. to scipy's minimize). I would guess the reason scipy's interface is like this is because it makes it easier for the optimization algorithm. However, this does not really matter if we are just wrapping a theano-based algorithm (that already has to handle multiple parameters), and avoiding useless data copies on each call to f / df can only help speed-wise. OD asks: Why make a difference between iterative and one-shot versions? A one-shot algorithm can be seen as an iterative one that stops after its first iteration. The difference I see between the two interfaces proposed here is mostly that one relies on Theano while the other one does not, but hopefully a non-Theano one can be created by simply wrapping around the Theano one. JB replies: Right, it would make more sense to distinguish them by the fact that one works on Theano objects, and the other on general Python callable functions. There is room for an iterative numpy interface, but I didn't make it yet. Would that answer your question? OD replies and asks: Partly. Do we really need a non-iterative interface? Examples -------- Simple stochastic gradient descent with extra updates: sgd([p], gradients=[g], updates={a:b}, step_size=.1) will return {a:b, p:p-.1*g}