Mercurial > pylearn
view doc/v2_planning/optimization.txt @ 1067:4f287324a5ad
Merged
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 10 Sep 2010 10:00:49 -0400 |
parents | a41cc29cee26 |
children | 9fe0f0755b03 |
line wrap: on
line source
========================= Optimization for Learning ========================= Members: Bergstra, Lamblin, Dellaleau, Glorot, Breuleux, Bordes Leader: Bergstra Initial Writeup by James ========================================= Previous work - scikits, openopt, scipy provide function optimization algorithms. These are not currently GPU-enabled but may be in the future. IS PREVIOUS WORK SUFFICIENT? -------------------------------- In many cases it is (I used it for sparse coding, and it was ok). These packages provide batch optimization, whereas we typically need online optimization. It can be faster (to run) and more convenient (to implement) to have optimization algorithms as Theano update expressions. What optimization algorithms do we want/need? --------------------------------------------- - sgd - sgd + momentum - sgd with annealing schedule - TONGA - James Marten's Hessian-free - Conjugate gradients, batch and (large) mini-batch [that is also what Marten's thing does] Do we need anything to make batch algos work better with Pylearn things? - conjugate methods? yes - L-BFGS? maybe, when needed Proposal for API ================ Stick to the same style of API that we've used for SGD so far. I think it has worked well. It takes theano expressions as inputs and returns theano expressions as results. The caller is responsible for building those expressions into a callable function that does the minimization (and other things too maybe). def stochastic_gradientbased_optimization_updates(parameters, cost=None, grads=None, **kwargs): """ :param parameters: list or tuple of Theano variables (typically shared vars) that we want to optimize iteratively algorithm. :param cost: scalar-valued Theano variable that computes noisy estimate of cost (what are the conditions on the noise?). The cost is ignored if grads are given. :param grads: list or tuple of Theano variables representing the gradients on the corresponding parameters. These default to tensor.grad(cost, parameters). :param kwargs: algorithm-dependent arguments :returns: a list of pairs (v, new_v) that indicate the value (new_v) each variable (v) should take in order to carry out the optimization procedure. The first section of the return value list corresponds to the terms in `parameters`, and the optimization algorithm can return additional update expression afterward. This list of pairs can be passed directly to the dict() constructor to create a dictionary such that dct[v] == new_v. """ Why not a class interface with an __init__ that takes the kwargs, and an updates() that returns the updates? It would be wrong for auxiliary shared variables to be involved in two updates, so the interface should not encourage separate methods for those two steps.