Mercurial > pylearn
view doc/v2_planning/optimization.txt @ 1202:7fff3d5c7694
ARCHITECTURE/LAYER: a incomplete story about the plug-ins and way of constructing models
author | pascanur |
---|---|
date | Mon, 20 Sep 2010 20:35:03 -0400 |
parents | 0e12ea6ba661 |
children |
line wrap: on
line source
========================= Optimization for Learning ========================= Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes Leader: Bergstra Initial Writeup by James ========================================= Previous work - scikits, openopt, scipy provide function optimization algorithms. These are not currently GPU-enabled but may be in the future. IS PREVIOUS WORK SUFFICIENT? -------------------------------- In many cases it is (I used it for sparse coding, and it was ok). These packages provide batch optimization, whereas we typically need online optimization. It can be faster (to run) and more convenient (to implement) to have optimization algorithms as Theano update expressions. What optimization algorithms do we want/need? --------------------------------------------- - sgd - sgd + momentum - sgd with annealing schedule - TONGA - James Marten's Hessian-free - Conjugate gradients, batch and (large) mini-batch [that is also what Marten's thing does] Do we need anything to make batch algos work better with Pylearn things? - conjugate methods? yes - L-BFGS? maybe, when needed Discussion ========== OD asks: Could it be more convenient for x0 to be a list? JB replies: Yes, but that's not the interface used by other minimize() routines (e.g. in scipy). Maybe another list-based interface is required? OD replies: I think most people would prefer to use a list-based interface, so they don't have to manually pack / unpack multiple arrrays of parameters. So I would vote in favor or having both (where the main reason to also provide a non-list interface would be to allow one to easily switch e.g. to scipy's minimize). I would guess the reason scipy's interface is like this is because it makes it easier for the optimization algorithm. However, this does not really matter if we are just wrapping a theano-based algorithm (that already has to handle multiple parameters), and avoiding useless data copies on each call to f / df can only help speed-wise. JB replies: Done, I added possibility that x0 is list of ndarrays to the api doc. OD asks: Why make a difference between iterative and one-shot versions? A one-shot algorithm can be seen as an iterative one that stops after its first iteration. The difference I see between the two interfaces proposed here is mostly that one relies on Theano while the other one does not, but hopefully a non-Theano one can be created by simply wrapping around the Theano one. JB replies: Right, it would make more sense to distinguish them by the fact that one works on Theano objects, and the other on general Python callable functions. There is room for an iterative numpy interface, but I didn't make it yet. Would that answer your question? OD replies and asks: Partly. Do we really need a non-iterative interface? OD: I wish we could get closer to each other the Theano and Numpy interfaces. It would be nice if we could do something like: .. code-block:: python # Theano version. updates = sgd([p], gradients=[g], stop=stop, step_size=.1) sgd_step = theano.function([input_var, target_var], [], updates=updates) while not stop.value: input, target = training_iter.next() sgd_step(input, target) # Numpy version (you can replace *.value by regular numpy arrays). sgd_step = sgd([p.value], gradients=g_func, stop=stop.value, step_size=.1) while not stop.value: input, target = training_iter.next() sgd_step(input, target) where sgd would look something like: .. code-block:: python class sgd(...): def __init__(self, parameters, cost=None, gradients=None, stop=None, step_size=None): # Allow for extra arguments to be provided in self.__call__, that # are forwarded to the underlying gradients function. self.gradients = lambda *lst, **kw: gradients(*(parameters + lst), **kw) ... def __call__(*lst, **kw): grads = self.gradients(*lst, **kw) for param, grad in izip(self.parameters, grads): param -= self.step_size * grad Then a wrapper to provide a scipy-like interface could be: .. code-block:: python def minimize(x0, f, df, algo, **kw): stop = numpy.array(0, dtype=numpy.int8) algo_step = eval(algo)([x0], cost=f, gradients=lambda x: (df(x), ), stop=stop, **kw) while not stop: algo_step()