comparison doc/v2_planning/API_optimization.txt @ 1182:14aa0a5bb661

Renamed api_optimization.txt -> API_optimization.txt to be compliant with Yoshua's naming conventions
author Olivier Delalleau <delallea@iro>
date Fri, 17 Sep 2010 16:41:51 -0400
parents doc/v2_planning/api_optimization.txt@7c5dc11c850a
children 4ea46ef9822a
comparison
equal deleted inserted replaced
1181:ae4b4f7654ec 1182:14aa0a5bb661
1 Optimization API
2 ================
3
4 Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes
5 Leader: Bergstra
6
7
8 Description
9 -----------
10
11 This API is for iterative optimization algorithms, such as:
12
13 - stochastic gradient descent (incl. momentum, annealing)
14 - delta bar delta
15 - conjugate methods
16 - L-BFGS
17 - "Hessian Free"
18 - SGD-QN
19 - TONGA
20
21 The API includes an iterative interface based on Theano, and a one-shot
22 interface similar to SciPy and MATLAB that is based on Python and Numpy, that
23 only uses Theano for the implementation.
24
25
26 Theano Interface
27 -----------------
28
29 The theano interface to optimization algorithms is to ask for a dictionary of
30 updates that can be used in theano.function. Implementations of iterative
31 optimization algorithms should be global functions with a signature like
32 'iterative_optimizer'.
33
34 def iterative_optimizer(parameters,
35 cost=None,
36 gradients=None,
37 stop=None,
38 updates=None,
39 **kwargs):
40 """
41 :param parameters: list or tuple of Theano variables
42 that we want to optimize iteratively. If we're minimizing f(x), then
43 together, these variables represent 'x'. Typically these are shared
44 variables and their values are the initial values for the minimization
45 algorithm.
46
47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of
48 cost (what are the conditions on the noise?). Some algorithms might
49 need an exact cost, some algorithms might ignore the cost if the
50 gradients are given.
51
52 :param gradients: list or tuple of Theano variables representing the gradients on
53 the corresponding parameters. These default to tensor.grad(cost,
54 parameters).
55
56 :param stop: a shared variable (scalar integer) that (if provided) will be
57 updated to say when the iterative minimization algorithm has finished
58 (1) or requires more iterations (0).
59
60 :param updates: a dictionary to update with the (var, new_value) items
61 associated with the iterative algorithm. The default is a new empty
62 dictionary. A KeyError is raised in case of key collisions.
63
64 :param kwargs: algorithm-dependent arguments
65
66 :returns: a dictionary mapping each parameter to an expression that it
67 should take in order to carry out the optimization procedure.
68
69 If all the parameters are shared variables, then this dictionary may be
70 passed as the ``updates`` argument to theano.function.
71
72 There may be more key,value pairs in the dictionary corresponding to
73 internal variables that are part of the optimization algorithm.
74
75 """
76
77
78 Numpy Interface
79 ---------------
80
81 The numpy interface to optimization algorithms is supposed to mimick
82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy
83 arrays.
84
85 def minimize(x0, f, df, opt_algo, **kwargs):
86 """
87 Return a point x_new with the same type as x0 that minimizes function `f`
88 with derivative `df`.
89
90 This is supposed to provide an interface similar to scipy's minimize
91 routines, or MATLAB's.
92
93 :type x0: numpy ndarray or list of numpy ndarrays.
94 :param x0: starting point for minimization
95
96 :type f: python callable mapping something like x0 to a scalar
97 :param f: function to minimize
98
99 :type df: python callable mapping something like x0 to the derivative of f at that point
100 :param df: derivative of `f`
101
102 :param opt_algo: one of the functions that implements the
103 `iterative_optimizer` interface.
104
105 :param kwargs: passed through to `opt_algo`
106
107 """
108
109
110 There is also a numpy-based wrapper to the iterative algorithms.
111 This can be more useful than minimize() because it doesn't hog program
112 control. Technically minimize() is probably implemented using this
113 minimize_iterator interface.
114
115 class minimize_iterator(object):
116 """
117 Attributes
118 - x - the current best estimate of the minimum
119 - f - the function being minimized
120 - df - f's derivative function
121 - opt_algo - the optimization algorithm at work (a serializable, callable
122 object with the signature of iterative_optimizer above).
123
124 """
125 def __init__(self, x0, f, df, opt_algo, **kwargs):
126 """Initialize state (arguments as in minimize())
127 """
128 def __iter__(self):
129 return self
130 def next(self):
131 """Take a step of minimization and return self raises StopIteration when
132 the algorithm is finished with minimization
133
134 """
135
136
137 Examples
138 --------
139
140 Simple stochastic gradient descent could be called like this:
141
142 sgd([p], gradients=[g], step_size=.1)
143
144 and this would return
145
146 {p:p-.1*g}
147
148
149 Simple stochastic gradient descent with extra updates:
150
151 sgd([p], gradients=[g], updates={a:b}, step_size=.1)
152
153 will return
154
155 {a:b, p:p-.1*g}
156
157
158 If the parameters collide with keys in a given updates dictionary an exception
159 will be raised:
160
161 sgd([p], gradients=[g], updates={p:b}, step_size=.1)
162
163 will raise a KeyError.