Mercurial > pylearn

=========================
Optimization for Learning
=========================

Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes
Leader: Bergstra


Initial Writeup by James
=========================================


Previous work - scikits, openopt, scipy  provide function optimization
algorithms.  These are not currently GPU-enabled but may be in the future.


IS PREVIOUS WORK SUFFICIENT?
--------------------------------

In many cases it is (I used it for sparse coding, and it was ok).

These packages provide batch optimization, whereas we typically need online
optimization.

It can be faster (to run) and more convenient (to implement) to have
optimization algorithms as Theano update expressions.


What optimization algorithms do we want/need?
---------------------------------------------

 - sgd
 - sgd + momentum
 - sgd with annealing schedule
 - TONGA
 - James Marten's Hessian-free
 - Conjugate gradients, batch and (large) mini-batch [that is also what Marten's thing does]

Do we need anything to make batch algos work better with Pylearn things?
 - conjugate methods? yes
 - L-BFGS? maybe, when needed


Proposal for API
================

See api_optimization.txt.

OD asks: Do we really need a different file? If yes, maybe create a subdirectory
to be able to easily find all files related to optimization?

JB replies: Yoshua's orders.


OD asks: Could it be more convenient for x0 to be a list?

JB replies: Yes, but that's not the interface used by other minimize()
routines (e.g. in scipy).  Maybe another list-based interface is required?

OD replies: I think most people would prefer to use a list-based interface, so
    they don't have to manually pack / unpack multiple arrrays of parameters. So I
    would vote in favor or having both (where the main reason to also provide a
    non-list interface would be to allow one to easily switch e.g. to scipy's
    minimize).
    I would guess the reason scipy's interface is like this is because it makes
    it easier for the optimization algorithm. However, this does not really
    matter if we are just wrapping a theano-based algorithm (that already has
    to handle multiple parameters), and avoiding useless data copies on each call
    to f / df can only help speed-wise.
JB replies: Done, I added possibility that x0 is list of ndarrays to the api
doc.


OD asks: Why make a difference between iterative and one-shot versions? A one-shot
    algorithm can be seen as an iterative one that stops after its first
    iteration. The difference I see between the two interfaces proposed here
    is mostly that one relies on Theano while the other one does not, but
    hopefully a non-Theano one can be created by simply wrapping around the
    Theano one.

JB replies: Right, it would make more sense to distinguish them by the fact that
one works on Theano objects, and the other on general Python callable functions.
There is room for an iterative numpy interface, but I didn't make it yet.  Would
that answer your question?

OD replies and asks: Partly. Do we really need a non-iterative interface?
author	Razvan Pascanu <r.pascanu@gmail.com>
date	Thu, 16 Sep 2010 17:34:30 -0400
parents	7c5dc11c850a
children	f2105a06201c