Mercurial > pylearn
comparison doc/v2_planning/API_optimization.txt @ 1182:14aa0a5bb661
Renamed api_optimization.txt -> API_optimization.txt to be compliant with Yoshua's naming conventions
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Fri, 17 Sep 2010 16:41:51 -0400 |
parents | doc/v2_planning/api_optimization.txt@7c5dc11c850a |
children | 4ea46ef9822a |
comparison
equal
deleted
inserted
replaced
1181:ae4b4f7654ec | 1182:14aa0a5bb661 |
---|---|
1 Optimization API | |
2 ================ | |
3 | |
4 Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes | |
5 Leader: Bergstra | |
6 | |
7 | |
8 Description | |
9 ----------- | |
10 | |
11 This API is for iterative optimization algorithms, such as: | |
12 | |
13 - stochastic gradient descent (incl. momentum, annealing) | |
14 - delta bar delta | |
15 - conjugate methods | |
16 - L-BFGS | |
17 - "Hessian Free" | |
18 - SGD-QN | |
19 - TONGA | |
20 | |
21 The API includes an iterative interface based on Theano, and a one-shot | |
22 interface similar to SciPy and MATLAB that is based on Python and Numpy, that | |
23 only uses Theano for the implementation. | |
24 | |
25 | |
26 Theano Interface | |
27 ----------------- | |
28 | |
29 The theano interface to optimization algorithms is to ask for a dictionary of | |
30 updates that can be used in theano.function. Implementations of iterative | |
31 optimization algorithms should be global functions with a signature like | |
32 'iterative_optimizer'. | |
33 | |
34 def iterative_optimizer(parameters, | |
35 cost=None, | |
36 gradients=None, | |
37 stop=None, | |
38 updates=None, | |
39 **kwargs): | |
40 """ | |
41 :param parameters: list or tuple of Theano variables | |
42 that we want to optimize iteratively. If we're minimizing f(x), then | |
43 together, these variables represent 'x'. Typically these are shared | |
44 variables and their values are the initial values for the minimization | |
45 algorithm. | |
46 | |
47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of | |
48 cost (what are the conditions on the noise?). Some algorithms might | |
49 need an exact cost, some algorithms might ignore the cost if the | |
50 gradients are given. | |
51 | |
52 :param gradients: list or tuple of Theano variables representing the gradients on | |
53 the corresponding parameters. These default to tensor.grad(cost, | |
54 parameters). | |
55 | |
56 :param stop: a shared variable (scalar integer) that (if provided) will be | |
57 updated to say when the iterative minimization algorithm has finished | |
58 (1) or requires more iterations (0). | |
59 | |
60 :param updates: a dictionary to update with the (var, new_value) items | |
61 associated with the iterative algorithm. The default is a new empty | |
62 dictionary. A KeyError is raised in case of key collisions. | |
63 | |
64 :param kwargs: algorithm-dependent arguments | |
65 | |
66 :returns: a dictionary mapping each parameter to an expression that it | |
67 should take in order to carry out the optimization procedure. | |
68 | |
69 If all the parameters are shared variables, then this dictionary may be | |
70 passed as the ``updates`` argument to theano.function. | |
71 | |
72 There may be more key,value pairs in the dictionary corresponding to | |
73 internal variables that are part of the optimization algorithm. | |
74 | |
75 """ | |
76 | |
77 | |
78 Numpy Interface | |
79 --------------- | |
80 | |
81 The numpy interface to optimization algorithms is supposed to mimick | |
82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy | |
83 arrays. | |
84 | |
85 def minimize(x0, f, df, opt_algo, **kwargs): | |
86 """ | |
87 Return a point x_new with the same type as x0 that minimizes function `f` | |
88 with derivative `df`. | |
89 | |
90 This is supposed to provide an interface similar to scipy's minimize | |
91 routines, or MATLAB's. | |
92 | |
93 :type x0: numpy ndarray or list of numpy ndarrays. | |
94 :param x0: starting point for minimization | |
95 | |
96 :type f: python callable mapping something like x0 to a scalar | |
97 :param f: function to minimize | |
98 | |
99 :type df: python callable mapping something like x0 to the derivative of f at that point | |
100 :param df: derivative of `f` | |
101 | |
102 :param opt_algo: one of the functions that implements the | |
103 `iterative_optimizer` interface. | |
104 | |
105 :param kwargs: passed through to `opt_algo` | |
106 | |
107 """ | |
108 | |
109 | |
110 There is also a numpy-based wrapper to the iterative algorithms. | |
111 This can be more useful than minimize() because it doesn't hog program | |
112 control. Technically minimize() is probably implemented using this | |
113 minimize_iterator interface. | |
114 | |
115 class minimize_iterator(object): | |
116 """ | |
117 Attributes | |
118 - x - the current best estimate of the minimum | |
119 - f - the function being minimized | |
120 - df - f's derivative function | |
121 - opt_algo - the optimization algorithm at work (a serializable, callable | |
122 object with the signature of iterative_optimizer above). | |
123 | |
124 """ | |
125 def __init__(self, x0, f, df, opt_algo, **kwargs): | |
126 """Initialize state (arguments as in minimize()) | |
127 """ | |
128 def __iter__(self): | |
129 return self | |
130 def next(self): | |
131 """Take a step of minimization and return self raises StopIteration when | |
132 the algorithm is finished with minimization | |
133 | |
134 """ | |
135 | |
136 | |
137 Examples | |
138 -------- | |
139 | |
140 Simple stochastic gradient descent could be called like this: | |
141 | |
142 sgd([p], gradients=[g], step_size=.1) | |
143 | |
144 and this would return | |
145 | |
146 {p:p-.1*g} | |
147 | |
148 | |
149 Simple stochastic gradient descent with extra updates: | |
150 | |
151 sgd([p], gradients=[g], updates={a:b}, step_size=.1) | |
152 | |
153 will return | |
154 | |
155 {a:b, p:p-.1*g} | |
156 | |
157 | |
158 If the parameters collide with keys in a given updates dictionary an exception | |
159 will be raised: | |
160 | |
161 sgd([p], gradients=[g], updates={p:b}, step_size=.1) | |
162 | |
163 will raise a KeyError. |