Mercurial > pylearn
annotate pylearn/gd/sgd.py @ 1472:ddda8d93c162
dtype tweaks in sgd
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 18 May 2011 10:51:50 -0400 |
parents | 86bf03990aad |
children | a57f4839a9d8 |
rev | line source |
---|---|
999
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
1 """A stochastic gradient descent minimizer. |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
479
diff
changeset
|
2 """ |
1472
ddda8d93c162
dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1460
diff
changeset
|
3 import numpy |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
4 import theano |
477 | 5 |
999
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
6 def sgd_updates(params, grads, stepsizes): |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
7 """Return a list of (pairs) that can be used as updates in theano.function to implement |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
8 stochastic gradient descent. |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
9 |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
10 :param params: variables to adjust in order to minimize some cost |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
11 :type params: a list of variables (theano.function will require shared variables) |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
12 :param grads: the gradient on each param (with respect to some cost) |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
13 :type grads: list of theano expressions |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
14 :param stepsizes: step by this amount times the negative gradient on each iteration |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
15 :type stepsizes: [symbolic] scalar or list of one [symbolic] scalar per param |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
16 """ |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
17 try: |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
18 iter(stepsizes) |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
19 except: |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
20 stepsizes = [stepsizes for p in params] |
1460
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
21 if len(params) != len(grads): |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
22 raise ValueError('params and grads have different lens') |
999
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
23 updates = [(p, p - step * gp) for (step, p, gp) in zip(stepsizes, params, grads)] |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
24 return updates |
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
25 |
1460
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
26 def sgd_momentum_updates(params, grads, stepsizes, momentum=0.9): |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
27 # if stepsizes is just a scalar, expand it to match params |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
28 try: |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
29 iter(stepsizes) |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
30 except: |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
31 stepsizes = [stepsizes for p in params] |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
32 try: |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
33 iter(momentum) |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
34 except: |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
35 momentum = [momentum for p in params] |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
36 if len(params) != len(grads): |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
37 raise ValueError('params and grads have different lens') |
1472
ddda8d93c162
dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1460
diff
changeset
|
38 headings = [theano.shared(numpy.zeros_like(p.get_value(borrow=True))) for p in params] |
1460
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
39 updates = [] |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
40 for s, p, gp, m, h in zip(stepsizes, params, grads, momentum, headings): |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
41 updates.append((p, p + s * h)) |
1472
ddda8d93c162
dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
1460
diff
changeset
|
42 updates.append((h, m*h - (1.0-m)*gp)) |
1460
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
43 return updates |
86bf03990aad
added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
999
diff
changeset
|
44 |
999
c6d08a760960
added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
952
diff
changeset
|
45 |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
46 class StochasticGradientDescent(theano.Module): |
751
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
47 """Fixed stepsize gradient descent |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
48 |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
49 Methods for gradient descent are: |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
50 - step(arg_vals) which returns None and updates the params |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
51 - step_cost(arg_vals) which returns the cost value, and updates the params |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
52 |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
53 """ |
723
2881c67026c1
* when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents:
676
diff
changeset
|
54 def __init__(self, args, cost, params, |
2881c67026c1
* when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents:
676
diff
changeset
|
55 gradients=None, stepsize=None, |
751
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
56 updates=None, auxout=None, methods=True): |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
57 """ |
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
58 :param stepsize: the step to take in (negative) gradient direction |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
59 :type stepsize: None, scalar value, or scalar TensorVariable |
676
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
60 |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
61 :param updates: extra symbolic updates to make when evating either step or step_cost |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
62 (these override the gradients if necessary) |
763
f353c9a99f95
fix typo in comment.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
754
diff
changeset
|
63 :type updates: dict Variable -> Variable |
751
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
64 :param auxout: auxiliary outputs, list containing output symbols to |
723
2881c67026c1
* when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents:
676
diff
changeset
|
65 compute at the same time as cost (for efficiency) |
751
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
66 :param methods: Should this module define the step and step_cost methods? |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
67 """ |
477 | 68 super(StochasticGradientDescent, self).__init__() |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
69 self.stepsize_init = None |
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
70 |
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
71 if stepsize is None: |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
72 self.stepsize = theano.tensor.dscalar() |
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
73 elif isinstance(stepsize, theano.tensor.TensorVariable): |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
74 self.stepsize = stepsize |
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
75 else: |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
76 self.stepsize = (theano.tensor.as_tensor_variable(stepsize)) |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
77 |
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
78 if self.stepsize.ndim != 0: |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
79 raise TypeError('stepsize must be a scalar', stepsize) |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
80 |
477 | 81 self.params = params |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
82 self.gparams = theano.tensor.grad(cost, self.params) if gradients is None else gradients |
477 | 83 |
676
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
84 self._updates = (dict((p, p - self.stepsize * g) for p, g in zip(self.params, self.gparams))) |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
85 if updates is not None: |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
86 self._updates.update(updates) |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
87 |
751
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
88 if methods: |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
89 if auxout is None: |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
90 self.step = theano.Method(args, [], updates=self._updates) |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
91 self.step_cost = theano.Method(args, cost, updates=self._updates) |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
92 else: |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
93 # step cost always returns a list if auxout |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
94 self.step = theano.Method( |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
95 args, [] + auxout, |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
96 updates=self._updates) |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
97 self.step_cost = theano.Method( |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
98 args, [cost]+auxout, |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
99 updates=self._updates) |
88839ba37b97
added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
723
diff
changeset
|
100 |
676
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
101 |
fec0ba6f8c8f
added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
668
diff
changeset
|
102 updates = property(lambda self: self._updates.copy()) |
666
d69e668ab904
updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
599
diff
changeset
|
103 |
549
16894d38ce48
moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
540
diff
changeset
|
104 def _instance_initialize(self, obj): |
581
01e04bf878e2
removed some code that is not needed anymore as the bug is fixed. I will add a test in module later.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents:
573
diff
changeset
|
105 pass |
537
b054271b2504
new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
479
diff
changeset
|
106 |
668
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
107 def sgd_minimizer(stepsize=None): |
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
108 """Curry the stepsize argument to StochasticGradientDescent, providing standard minimizer interface |
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
109 |
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
110 :returns: standard minimizer constructor f(args, cost, params, gradient=None) |
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
111 """ |
744 | 112 def f(args, cost, params, gradients=None, updates=None, auxout=None): |
113 return StochasticGradientDescent(args, cost, params, gradients=gradients, stepsize=stepsize, | |
723
2881c67026c1
* when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents:
676
diff
changeset
|
114 updates=updates, auxout=auxout) |
668
15a317a02f08
added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
666
diff
changeset
|
115 return f |