Mercurial > pylearn
comparison doc/v2_planning/api_optimization.txt @ 1149:7c5dc11c850a
cleaning up api_optimization
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Thu, 16 Sep 2010 16:26:21 -0400 |
parents | c5c7ba805a2f |
children |
comparison
equal
deleted
inserted
replaced
1148:2da593b0f29d | 1149:7c5dc11c850a |
---|---|
29 The theano interface to optimization algorithms is to ask for a dictionary of | 29 The theano interface to optimization algorithms is to ask for a dictionary of |
30 updates that can be used in theano.function. Implementations of iterative | 30 updates that can be used in theano.function. Implementations of iterative |
31 optimization algorithms should be global functions with a signature like | 31 optimization algorithms should be global functions with a signature like |
32 'iterative_optimizer'. | 32 'iterative_optimizer'. |
33 | 33 |
34 def iterative_optimizer(parameters, | 34 def iterative_optimizer(parameters, |
35 cost=None, | 35 cost=None, |
36 gradients=None, | 36 gradients=None, |
37 stop=None, | 37 stop=None, |
38 updates=None, | 38 updates=None, |
39 **kwargs): | 39 **kwargs): |
40 """ | 40 """ |
41 :param parameters: list or tuple of Theano variables | 41 :param parameters: list or tuple of Theano variables |
42 that we want to optimize iteratively. If we're minimizing f(x), then | 42 that we want to optimize iteratively. If we're minimizing f(x), then |
43 together, these variables represent 'x'. Typically these are shared | 43 together, these variables represent 'x'. Typically these are shared |
44 variables and their values are the initial values for the minimization | 44 variables and their values are the initial values for the minimization |
45 algorithm. | 45 algorithm. |
46 | 46 |
47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of | 47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of |
48 cost (what are the conditions on the noise?). Some algorithms might | 48 cost (what are the conditions on the noise?). Some algorithms might |
49 need an exact cost, some algorithms might ignore the cost if the | 49 need an exact cost, some algorithms might ignore the cost if the |
50 gradients are given. | 50 gradients are given. |
51 | 51 |
52 :param gradients: list or tuple of Theano variables representing the gradients on | 52 :param gradients: list or tuple of Theano variables representing the gradients on |
53 the corresponding parameters. These default to tensor.grad(cost, | 53 the corresponding parameters. These default to tensor.grad(cost, |
54 parameters). | 54 parameters). |
55 | 55 |
56 :param stop: a shared variable (scalar integer) that (if provided) will be | 56 :param stop: a shared variable (scalar integer) that (if provided) will be |
57 updated to say when the iterative minimization algorithm has finished | 57 updated to say when the iterative minimization algorithm has finished |
58 (1) or requires more iterations (0). | 58 (1) or requires more iterations (0). |
59 | 59 |
60 :param updates: a dictionary to update with the (var, new_value) items | 60 :param updates: a dictionary to update with the (var, new_value) items |
61 associated with the iterative algorithm. The default is a new empty | 61 associated with the iterative algorithm. The default is a new empty |
62 dictionary. A KeyError is raised in case of key collisions. | 62 dictionary. A KeyError is raised in case of key collisions. |
63 | 63 |
64 :param kwargs: algorithm-dependent arguments | 64 :param kwargs: algorithm-dependent arguments |
65 | 65 |
66 :returns: a dictionary mapping each parameter to an expression that it | 66 :returns: a dictionary mapping each parameter to an expression that it |
67 should take in order to carry out the optimization procedure. | 67 should take in order to carry out the optimization procedure. |
68 | 68 |
69 If all the parameters are shared variables, then this dictionary may be | 69 If all the parameters are shared variables, then this dictionary may be |
70 passed as the ``updates`` argument to theano.function. | 70 passed as the ``updates`` argument to theano.function. |
71 | 71 |
72 There may be more key,value pairs in the dictionary corresponding to | 72 There may be more key,value pairs in the dictionary corresponding to |
73 internal variables that are part of the optimization algorithm. | 73 internal variables that are part of the optimization algorithm. |
74 | 74 |
75 """ | 75 """ |
76 | 76 |
77 | 77 |
78 Numpy Interface | 78 Numpy Interface |
79 --------------- | 79 --------------- |
80 | 80 |
81 The numpy interface to optimization algorithms is supposed to mimick | 81 The numpy interface to optimization algorithms is supposed to mimick |
82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy | 82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy |
83 arrays. | 83 arrays. |
84 | 84 |
85 TODO: There is also room for an iterative object (that doesn't hog program | 85 def minimize(x0, f, df, opt_algo, **kwargs): |
86 control) but which nonetheless works on numpy objects. Actually minimize() should | 86 """ |
87 use this iterative interface under the hood. | 87 Return a point x_new with the same type as x0 that minimizes function `f` |
88 with derivative `df`. | |
88 | 89 |
89 def minimize(x0, f, df, opt_algo, **kwargs): | 90 This is supposed to provide an interface similar to scipy's minimize |
90 """ | 91 routines, or MATLAB's. |
91 Return a point x_new that minimizes function `f` with derivative `df`. | |
92 | 92 |
93 This is supposed to provide an interface similar to scipy's minimize | 93 :type x0: numpy ndarray or list of numpy ndarrays. |
94 routines, or MATLAB's. | 94 :param x0: starting point for minimization |
95 | 95 |
96 :type x0: numpy ndarray | 96 :type f: python callable mapping something like x0 to a scalar |
97 :param x0: starting point for minimization | 97 :param f: function to minimize |
98 | 98 |
99 :type f: python callable mapping something like x0 to a scalar | 99 :type df: python callable mapping something like x0 to the derivative of f at that point |
100 :param f: function to minimize | 100 :param df: derivative of `f` |
101 | 101 |
102 :type df: python callable mapping something like x0 to the derivative of f at that point | 102 :param opt_algo: one of the functions that implements the |
103 :param df: derivative of `f` | 103 `iterative_optimizer` interface. |
104 | 104 |
105 :param opt_algo: one of the functions that implements the | 105 :param kwargs: passed through to `opt_algo` |
106 `iterative_optimizer` interface. | |
107 | 106 |
108 :param kwargs: passed through to `opt_algo` | 107 """ |
109 | 108 |
110 """ | |
111 | 109 |
112 OD asks: Could it be more convenient for x0 to be a list? | 110 There is also a numpy-based wrapper to the iterative algorithms. |
113 | 111 This can be more useful than minimize() because it doesn't hog program |
114 JB replies: Yes, but that's not the interface used by other minimize() | 112 control. Technically minimize() is probably implemented using this |
115 routines (e.g. in scipy). Maybe another list-based interface is required? | 113 minimize_iterator interface. |
116 | 114 |
117 OD replies: I think most people would prefer to use a list-based interface, so | 115 class minimize_iterator(object): |
118 they don't have to manually pack / unpack multiple arrrays of parameters. So I | 116 """ |
119 would vote in favor or having both (where the main reason to also provide a | 117 Attributes |
120 non-list interface would be to allow one to easily switch e.g. to scipy's | 118 - x - the current best estimate of the minimum |
121 minimize). | 119 - f - the function being minimized |
122 I would guess the reason scipy's interface is like this is because it makes | 120 - df - f's derivative function |
123 it easier for the optimization algorithm. However, this does not really | 121 - opt_algo - the optimization algorithm at work (a serializable, callable |
124 matter if we are just wrapping a theano-based algorithm (that already has | 122 object with the signature of iterative_optimizer above). |
125 to handle multiple parameters), and avoiding useless data copies on each call | |
126 to f / df can only help speed-wise. | |
127 | 123 |
128 OD asks: Why make a difference between iterative and one-shot versions? A one-shot | 124 """ |
129 algorithm can be seen as an iterative one that stops after its first | 125 def __init__(self, x0, f, df, opt_algo, **kwargs): |
130 iteration. The difference I see between the two interfaces proposed here | 126 """Initialize state (arguments as in minimize()) |
131 is mostly that one relies on Theano while the other one does not, but | 127 """ |
132 hopefully a non-Theano one can be created by simply wrapping around the | 128 def __iter__(self): |
133 Theano one. | 129 return self |
130 def next(self): | |
131 """Take a step of minimization and return self raises StopIteration when | |
132 the algorithm is finished with minimization | |
134 | 133 |
135 JB replies: Right, it would make more sense to distinguish them by the fact that | 134 """ |
136 one works on Theano objects, and the other on general Python callable functions. | |
137 There is room for an iterative numpy interface, but I didn't make it yet. Would | |
138 that answer your question? | |
139 | 135 |
140 OD replies and asks: Partly. Do we really need a non-iterative interface? | |
141 | 136 |
142 Examples | 137 Examples |
143 -------- | 138 -------- |
144 | 139 |
140 Simple stochastic gradient descent could be called like this: | |
141 | |
142 sgd([p], gradients=[g], step_size=.1) | |
143 | |
144 and this would return | |
145 | |
146 {p:p-.1*g} | |
147 | |
145 | 148 |
146 Simple stochastic gradient descent with extra updates: | 149 Simple stochastic gradient descent with extra updates: |
147 | 150 |
148 sgd([p], gradients=[g], updates={a:b}, step_size=.1) will return {a:b, p:p-.1*g} | 151 sgd([p], gradients=[g], updates={a:b}, step_size=.1) |
152 | |
153 will return | |
154 | |
155 {a:b, p:p-.1*g} | |
156 | |
157 | |
158 If the parameters collide with keys in a given updates dictionary an exception | |
159 will be raised: | |
160 | |
161 sgd([p], gradients=[g], updates={p:b}, step_size=.1) | |
162 | |
163 will raise a KeyError. |