annotate doc/v2_planning/optimization.txt @ 1176:b154979782bd

add the API_learner to the toc.
author Frederic Bastien <nouiz@nouiz.org>
date Fri, 17 Sep 2010 16:19:48 -0400
parents f2105a06201c
children 0e12ea6ba661
rev   line source
1064
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
1 =========================
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
2 Optimization for Learning
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
3 =========================
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
4
1068
9fe0f0755b03 optimization: Fixed typo in my name :o
Olivier Delalleau <delallea@iro>
parents: 1064
diff changeset
5 Members: Bergstra, Lamblin, Delalleau, Glorot, Breuleux, Bordes
1064
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
6 Leader: Bergstra
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
7
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
8
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
9
a41cc29cee26 v2planning optimization - API draft
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1057
diff changeset
10 Initial Writeup by James
1009
dc5185cca21e Added files for Coding Style and Optimization committees
Olivier Delalleau <delallea@iro>
parents:
diff changeset
11 =========================================
dc5185cca21e Added files for Coding Style and Optimization committees
Olivier Delalleau <delallea@iro>
parents:
diff changeset
12
1013
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
13
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
14
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
15 Previous work - scikits, openopt, scipy provide function optimization
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
16 algorithms. These are not currently GPU-enabled but may be in the future.
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
17
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
18
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
19 IS PREVIOUS WORK SUFFICIENT?
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
20 --------------------------------
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
21
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
22 In many cases it is (I used it for sparse coding, and it was ok).
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
23
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
24 These packages provide batch optimization, whereas we typically need online
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
25 optimization.
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
26
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
27 It can be faster (to run) and more convenient (to implement) to have
1016
618b9fdbfda5 optimization: Minor typo fixes
Olivier Delalleau <delallea@iro>
parents: 1013
diff changeset
28 optimization algorithms as Theano update expressions.
1013
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
29
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
30
1016
618b9fdbfda5 optimization: Minor typo fixes
Olivier Delalleau <delallea@iro>
parents: 1013
diff changeset
31 What optimization algorithms do we want/need?
1013
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
32 ---------------------------------------------
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
33
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
34 - sgd
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
35 - sgd + momentum
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
36 - sgd with annealing schedule
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
37 - TONGA
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
38 - James Marten's Hessian-free
1027
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
39 - Conjugate gradients, batch and (large) mini-batch [that is also what Marten's thing does]
1013
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
40
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
41 Do we need anything to make batch algos work better with Pylearn things?
1027
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
42 - conjugate methods? yes
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
43 - L-BFGS? maybe, when needed
1013
5e9a3d9bc0b4 optimization - added some text
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1009
diff changeset
44
1027
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
45
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
46
a1b6ccd5b6dc few comments added
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 1016
diff changeset
47
1057
baf1988db557 v2planning optimization - added API
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1036
diff changeset
48
1156
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
49 Discussion
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
50 ==========
1149
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
51
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
52 OD asks: Could it be more convenient for x0 to be a list?
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
53
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
54 JB replies: Yes, but that's not the interface used by other minimize()
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
55 routines (e.g. in scipy). Maybe another list-based interface is required?
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
56
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
57 OD replies: I think most people would prefer to use a list-based interface, so
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
58 they don't have to manually pack / unpack multiple arrrays of parameters. So I
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
59 would vote in favor or having both (where the main reason to also provide a
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
60 non-list interface would be to allow one to easily switch e.g. to scipy's
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
61 minimize).
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
62 I would guess the reason scipy's interface is like this is because it makes
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
63 it easier for the optimization algorithm. However, this does not really
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
64 matter if we are just wrapping a theano-based algorithm (that already has
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
65 to handle multiple parameters), and avoiding useless data copies on each call
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
66 to f / df can only help speed-wise.
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
67 JB replies: Done, I added possibility that x0 is list of ndarrays to the api
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
68 doc.
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
69
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
70
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
71
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
72 OD asks: Why make a difference between iterative and one-shot versions? A one-shot
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
73 algorithm can be seen as an iterative one that stops after its first
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
74 iteration. The difference I see between the two interfaces proposed here
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
75 is mostly that one relies on Theano while the other one does not, but
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
76 hopefully a non-Theano one can be created by simply wrapping around the
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
77 Theano one.
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
78
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
79 JB replies: Right, it would make more sense to distinguish them by the fact that
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
80 one works on Theano objects, and the other on general Python callable functions.
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
81 There is room for an iterative numpy interface, but I didn't make it yet. Would
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
82 that answer your question?
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
83
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
84 OD replies and asks: Partly. Do we really need a non-iterative interface?
7c5dc11c850a cleaning up api_optimization
James Bergstra <bergstrj@iro.umontreal.ca>
parents: 1071
diff changeset
85
1156
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
86 OD: I wish we could get closer to each other the Theano and Numpy interfaces.
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
87 It would be nice if we could do something like:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
88
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
89 # Theano version.
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
90 updates = sgd([p], gradients=[g], stop=stop, step_size=.1)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
91 sgd_step = theano.function([input_var, target_var], [], updates=updates)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
92 while not stop.value:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
93 input, target = training_iter.next()
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
94 sgd_step(input, target)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
95
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
96 # Numpy version (you can replace *.value by regular numpy arrays).
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
97 sgd_step = sgd([p.value], gradients=g_func, stop=stop.value, step_size=.1)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
98 while not stop.value:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
99 input, target = training_iter.next()
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
100 sgd_step(input, target)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
101
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
102 where sgd would look something like:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
103
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
104 class sgd(...):
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
105 def __init__(self, parameters, cost=None, gradients=None, stop=None,
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
106 step_size=None):
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
107 # Allow for extra arguments to be provided in self.__call__, that
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
108 # are forwarded to the underlying gradients function.
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
109 self.gradients = lambda *lst, **kw: gradients(*(parameters + lst),
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
110 **kw)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
111 ...
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
112
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
113 def __call__(*lst, **kw):
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
114 grads = self.gradients(*lst, **kw)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
115 for param, grad in izip(self.parameters, grads):
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
116 param -= self.step_size * grad
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
117
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
118 Then a wrapper to provide a scipy-like interface could be:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
119
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
120 def minimize(x0, f, df, algo, **kw):
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
121 stop = numpy.array(0, dtype=numpy.int8)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
122 algo_step = eval(algo)([x0], cost=f, gradients=lambda x: (df(x), ),
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
123 stop=stop, **kw)
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
124 while not stop:
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
125 algo_step()
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
126
f2105a06201c optimization: Proposal to get closer to each other the Theano and Numpy interfaces
Olivier Delalleau <delallea@iro>
parents: 1149
diff changeset
127