comparison doc/v2_planning/api_optimization.txt @ 1149:7c5dc11c850a

cleaning up api_optimization
author James Bergstra <bergstrj@iro.umontreal.ca>
date Thu, 16 Sep 2010 16:26:21 -0400
parents c5c7ba805a2f
children
comparison
equal deleted inserted replaced
1148:2da593b0f29d 1149:7c5dc11c850a
29 The theano interface to optimization algorithms is to ask for a dictionary of 29 The theano interface to optimization algorithms is to ask for a dictionary of
30 updates that can be used in theano.function. Implementations of iterative 30 updates that can be used in theano.function. Implementations of iterative
31 optimization algorithms should be global functions with a signature like 31 optimization algorithms should be global functions with a signature like
32 'iterative_optimizer'. 32 'iterative_optimizer'.
33 33
34 def iterative_optimizer(parameters, 34 def iterative_optimizer(parameters,
35 cost=None, 35 cost=None,
36 gradients=None, 36 gradients=None,
37 stop=None, 37 stop=None,
38 updates=None, 38 updates=None,
39 **kwargs): 39 **kwargs):
40 """ 40 """
41 :param parameters: list or tuple of Theano variables 41 :param parameters: list or tuple of Theano variables
42 that we want to optimize iteratively. If we're minimizing f(x), then 42 that we want to optimize iteratively. If we're minimizing f(x), then
43 together, these variables represent 'x'. Typically these are shared 43 together, these variables represent 'x'. Typically these are shared
44 variables and their values are the initial values for the minimization 44 variables and their values are the initial values for the minimization
45 algorithm. 45 algorithm.
46 46
47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of 47 :param cost: scalar-valued Theano variable that computes an exact or noisy estimate of
48 cost (what are the conditions on the noise?). Some algorithms might 48 cost (what are the conditions on the noise?). Some algorithms might
49 need an exact cost, some algorithms might ignore the cost if the 49 need an exact cost, some algorithms might ignore the cost if the
50 gradients are given. 50 gradients are given.
51 51
52 :param gradients: list or tuple of Theano variables representing the gradients on 52 :param gradients: list or tuple of Theano variables representing the gradients on
53 the corresponding parameters. These default to tensor.grad(cost, 53 the corresponding parameters. These default to tensor.grad(cost,
54 parameters). 54 parameters).
55 55
56 :param stop: a shared variable (scalar integer) that (if provided) will be 56 :param stop: a shared variable (scalar integer) that (if provided) will be
57 updated to say when the iterative minimization algorithm has finished 57 updated to say when the iterative minimization algorithm has finished
58 (1) or requires more iterations (0). 58 (1) or requires more iterations (0).
59 59
60 :param updates: a dictionary to update with the (var, new_value) items 60 :param updates: a dictionary to update with the (var, new_value) items
61 associated with the iterative algorithm. The default is a new empty 61 associated with the iterative algorithm. The default is a new empty
62 dictionary. A KeyError is raised in case of key collisions. 62 dictionary. A KeyError is raised in case of key collisions.
63 63
64 :param kwargs: algorithm-dependent arguments 64 :param kwargs: algorithm-dependent arguments
65 65
66 :returns: a dictionary mapping each parameter to an expression that it 66 :returns: a dictionary mapping each parameter to an expression that it
67 should take in order to carry out the optimization procedure. 67 should take in order to carry out the optimization procedure.
68 68
69 If all the parameters are shared variables, then this dictionary may be 69 If all the parameters are shared variables, then this dictionary may be
70 passed as the ``updates`` argument to theano.function. 70 passed as the ``updates`` argument to theano.function.
71 71
72 There may be more key,value pairs in the dictionary corresponding to 72 There may be more key,value pairs in the dictionary corresponding to
73 internal variables that are part of the optimization algorithm. 73 internal variables that are part of the optimization algorithm.
74 74
75 """ 75 """
76 76
77 77
78 Numpy Interface 78 Numpy Interface
79 --------------- 79 ---------------
80 80
81 The numpy interface to optimization algorithms is supposed to mimick 81 The numpy interface to optimization algorithms is supposed to mimick
82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy 82 scipy's. Its arguments are numpy arrays, and functions that manipulate numpy
83 arrays. 83 arrays.
84 84
85 TODO: There is also room for an iterative object (that doesn't hog program 85 def minimize(x0, f, df, opt_algo, **kwargs):
86 control) but which nonetheless works on numpy objects. Actually minimize() should 86 """
87 use this iterative interface under the hood. 87 Return a point x_new with the same type as x0 that minimizes function `f`
88 with derivative `df`.
88 89
89 def minimize(x0, f, df, opt_algo, **kwargs): 90 This is supposed to provide an interface similar to scipy's minimize
90 """ 91 routines, or MATLAB's.
91 Return a point x_new that minimizes function `f` with derivative `df`.
92 92
93 This is supposed to provide an interface similar to scipy's minimize 93 :type x0: numpy ndarray or list of numpy ndarrays.
94 routines, or MATLAB's. 94 :param x0: starting point for minimization
95 95
96 :type x0: numpy ndarray 96 :type f: python callable mapping something like x0 to a scalar
97 :param x0: starting point for minimization 97 :param f: function to minimize
98 98
99 :type f: python callable mapping something like x0 to a scalar 99 :type df: python callable mapping something like x0 to the derivative of f at that point
100 :param f: function to minimize 100 :param df: derivative of `f`
101 101
102 :type df: python callable mapping something like x0 to the derivative of f at that point 102 :param opt_algo: one of the functions that implements the
103 :param df: derivative of `f` 103 `iterative_optimizer` interface.
104 104
105 :param opt_algo: one of the functions that implements the 105 :param kwargs: passed through to `opt_algo`
106 `iterative_optimizer` interface.
107 106
108 :param kwargs: passed through to `opt_algo` 107 """
109 108
110 """
111 109
112 OD asks: Could it be more convenient for x0 to be a list? 110 There is also a numpy-based wrapper to the iterative algorithms.
113 111 This can be more useful than minimize() because it doesn't hog program
114 JB replies: Yes, but that's not the interface used by other minimize() 112 control. Technically minimize() is probably implemented using this
115 routines (e.g. in scipy). Maybe another list-based interface is required? 113 minimize_iterator interface.
116 114
117 OD replies: I think most people would prefer to use a list-based interface, so 115 class minimize_iterator(object):
118 they don't have to manually pack / unpack multiple arrrays of parameters. So I 116 """
119 would vote in favor or having both (where the main reason to also provide a 117 Attributes
120 non-list interface would be to allow one to easily switch e.g. to scipy's 118 - x - the current best estimate of the minimum
121 minimize). 119 - f - the function being minimized
122 I would guess the reason scipy's interface is like this is because it makes 120 - df - f's derivative function
123 it easier for the optimization algorithm. However, this does not really 121 - opt_algo - the optimization algorithm at work (a serializable, callable
124 matter if we are just wrapping a theano-based algorithm (that already has 122 object with the signature of iterative_optimizer above).
125 to handle multiple parameters), and avoiding useless data copies on each call
126 to f / df can only help speed-wise.
127 123
128 OD asks: Why make a difference between iterative and one-shot versions? A one-shot 124 """
129 algorithm can be seen as an iterative one that stops after its first 125 def __init__(self, x0, f, df, opt_algo, **kwargs):
130 iteration. The difference I see between the two interfaces proposed here 126 """Initialize state (arguments as in minimize())
131 is mostly that one relies on Theano while the other one does not, but 127 """
132 hopefully a non-Theano one can be created by simply wrapping around the 128 def __iter__(self):
133 Theano one. 129 return self
130 def next(self):
131 """Take a step of minimization and return self raises StopIteration when
132 the algorithm is finished with minimization
134 133
135 JB replies: Right, it would make more sense to distinguish them by the fact that 134 """
136 one works on Theano objects, and the other on general Python callable functions.
137 There is room for an iterative numpy interface, but I didn't make it yet. Would
138 that answer your question?
139 135
140 OD replies and asks: Partly. Do we really need a non-iterative interface?
141 136
142 Examples 137 Examples
143 -------- 138 --------
144 139
140 Simple stochastic gradient descent could be called like this:
141
142 sgd([p], gradients=[g], step_size=.1)
143
144 and this would return
145
146 {p:p-.1*g}
147
145 148
146 Simple stochastic gradient descent with extra updates: 149 Simple stochastic gradient descent with extra updates:
147 150
148 sgd([p], gradients=[g], updates={a:b}, step_size=.1) will return {a:b, p:p-.1*g} 151 sgd([p], gradients=[g], updates={a:b}, step_size=.1)
152
153 will return
154
155 {a:b, p:p-.1*g}
156
157
158 If the parameters collide with keys in a given updates dictionary an exception
159 will be raised:
160
161 sgd([p], gradients=[g], updates={p:b}, step_size=.1)
162
163 will raise a KeyError.