annotate pylearn/gd/sgd.py @ 1474:a57f4839a9d8

merge
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 18 May 2011 10:52:42 -0400
parents ddda8d93c162 cac29ca79a74
children 0e6ca7eecc72
rev   line source
999
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
1 """A stochastic gradient descent minimizer.
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 479
diff changeset
2 """
1472
ddda8d93c162 dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1460
diff changeset
3 import numpy
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
4 import theano
477
8ff412852d66 added sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
5
999
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
6 def sgd_updates(params, grads, stepsizes):
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
7 """Return a list of (pairs) that can be used as updates in theano.function to implement
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
8 stochastic gradient descent.
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
9
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
10 :param params: variables to adjust in order to minimize some cost
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
11 :type params: a list of variables (theano.function will require shared variables)
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
12 :param grads: the gradient on each param (with respect to some cost)
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
13 :type grads: list of theano expressions
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
14 :param stepsizes: step by this amount times the negative gradient on each iteration
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
15 :type stepsizes: [symbolic] scalar or list of one [symbolic] scalar per param
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
16 """
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
17 try:
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
18 iter(stepsizes)
1462
7c72948e1d53 Catch only exception to don't catch ctrl-C.
Frederic Bastien <nouiz@nouiz.org>
parents: 1460
diff changeset
19 except Exception:
999
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
20 stepsizes = [stepsizes for p in params]
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
21 if len(params) != len(grads):
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
22 raise ValueError('params and grads have different lens')
999
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
23 updates = [(p, p - step * gp) for (step, p, gp) in zip(stepsizes, params, grads)]
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
24 return updates
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
25
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
26 def sgd_momentum_updates(params, grads, stepsizes, momentum=0.9):
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
27 # if stepsizes is just a scalar, expand it to match params
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
28 try:
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
29 iter(stepsizes)
1462
7c72948e1d53 Catch only exception to don't catch ctrl-C.
Frederic Bastien <nouiz@nouiz.org>
parents: 1460
diff changeset
30 except Exception:
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
31 stepsizes = [stepsizes for p in params]
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
32 try:
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
33 iter(momentum)
1462
7c72948e1d53 Catch only exception to don't catch ctrl-C.
Frederic Bastien <nouiz@nouiz.org>
parents: 1460
diff changeset
34 except Exception:
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
35 momentum = [momentum for p in params]
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
36 if len(params) != len(grads):
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
37 raise ValueError('params and grads have different lens')
1472
ddda8d93c162 dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1460
diff changeset
38 headings = [theano.shared(numpy.zeros_like(p.get_value(borrow=True))) for p in params]
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
39 updates = []
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
40 for s, p, gp, m, h in zip(stepsizes, params, grads, momentum, headings):
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
41 updates.append((p, p + s * h))
1472
ddda8d93c162 dtype tweaks in sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1460
diff changeset
42 updates.append((h, m*h - (1.0-m)*gp))
1460
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
43 return updates
86bf03990aad added sgd_momentum_updates to gd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 999
diff changeset
44
999
c6d08a760960 added sgd_updates to gd/sgd.py. Modif mcRBM to use it.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 952
diff changeset
45
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
46 class StochasticGradientDescent(theano.Module):
751
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
47 """Fixed stepsize gradient descent
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
48
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
49 Methods for gradient descent are:
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
50 - step(arg_vals) which returns None and updates the params
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
51 - step_cost(arg_vals) which returns the cost value, and updates the params
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
52
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
53 """
723
2881c67026c1 * when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents: 676
diff changeset
54 def __init__(self, args, cost, params,
2881c67026c1 * when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents: 676
diff changeset
55 gradients=None, stepsize=None,
751
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
56 updates=None, auxout=None, methods=True):
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
57 """
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
58 :param stepsize: the step to take in (negative) gradient direction
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
59 :type stepsize: None, scalar value, or scalar TensorVariable
676
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
60
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
61 :param updates: extra symbolic updates to make when evating either step or step_cost
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
62 (these override the gradients if necessary)
763
f353c9a99f95 fix typo in comment.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 754
diff changeset
63 :type updates: dict Variable -> Variable
751
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
64 :param auxout: auxiliary outputs, list containing output symbols to
723
2881c67026c1 * when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents: 676
diff changeset
65 compute at the same time as cost (for efficiency)
751
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
66 :param methods: Should this module define the step and step_cost methods?
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
67 """
477
8ff412852d66 added sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
68 super(StochasticGradientDescent, self).__init__()
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
69 self.stepsize_init = None
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
70
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
71 if stepsize is None:
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
72 self.stepsize = theano.tensor.dscalar()
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
73 elif isinstance(stepsize, theano.tensor.TensorVariable):
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
74 self.stepsize = stepsize
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
75 else:
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
76 self.stepsize = (theano.tensor.as_tensor_variable(stepsize))
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
77
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
78 if self.stepsize.ndim != 0:
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
79 raise TypeError('stepsize must be a scalar', stepsize)
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
80
477
8ff412852d66 added sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
81 self.params = params
1468
cac29ca79a74 small fix to sgd. This should remove buildbot error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1462
diff changeset
82 self.gparams = [theano.tensor.grad(cost, self.params)] if gradients is None else gradients
cac29ca79a74 small fix to sgd. This should remove buildbot error.
Frederic Bastien <nouiz@nouiz.org>
parents: 1462
diff changeset
83 assert len(self.params) == len(self.gparams)
477
8ff412852d66 added sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents:
diff changeset
84
676
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
85 self._updates = (dict((p, p - self.stepsize * g) for p, g in zip(self.params, self.gparams)))
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
86 if updates is not None:
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
87 self._updates.update(updates)
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
88
751
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
89 if methods:
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
90 if auxout is None:
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
91 self.step = theano.Method(args, [], updates=self._updates)
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
92 self.step_cost = theano.Method(args, cost, updates=self._updates)
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
93 else:
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
94 # step cost always returns a list if auxout
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
95 self.step = theano.Method(
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
96 args, [] + auxout,
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
97 updates=self._updates)
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
98 self.step_cost = theano.Method(
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
99 args, [cost]+auxout,
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
100 updates=self._updates)
88839ba37b97 added methods kwarg to sgd module
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 723
diff changeset
101
676
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
102
fec0ba6f8c8f added updates parameter to sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 668
diff changeset
103 updates = property(lambda self: self._updates.copy())
666
d69e668ab904 updating minimizer, sgd to new theano. added sgd tests
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 599
diff changeset
104
549
16894d38ce48 moving stuff in algorithms, added rnn
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 540
diff changeset
105 def _instance_initialize(self, obj):
581
01e04bf878e2 removed some code that is not needed anymore as the bug is fixed. I will add a test in module later.
Frederic Bastien <bastienf@iro.umontreal.ca>
parents: 573
diff changeset
106 pass
537
b054271b2504 new file structure layout, factories, etc.
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 479
diff changeset
107
668
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
108 def sgd_minimizer(stepsize=None):
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
109 """Curry the stepsize argument to StochasticGradientDescent, providing standard minimizer interface
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
110
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
111 :returns: standard minimizer constructor f(args, cost, params, gradient=None)
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
112 """
744
4d22396678e6 typos in parameter names
bergstra@ip05.m
parents: 723
diff changeset
113 def f(args, cost, params, gradients=None, updates=None, auxout=None):
4d22396678e6 typos in parameter names
bergstra@ip05.m
parents: 723
diff changeset
114 return StochasticGradientDescent(args, cost, params, gradients=gradients, stepsize=stepsize,
723
2881c67026c1 * when creating the sgd minimizer, the user can ask that step_cost compute other
desjagui@atchoum.iro.umontreal.ca
parents: 676
diff changeset
115 updates=updates, auxout=auxout)
668
15a317a02f08 added sgd_minimizer back into sgd
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 666
diff changeset
116 return f