comparison doc/v2_planning/learner.txt @ 1043:3f528656855b

v2planning learner.txt - updated API recommendation
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 08 Sep 2010 11:33:33 -0400
parents 38cc6e075d9b
children 3b1fd599bafd
comparison
equal deleted inserted replaced
1042:4eaf576c3e9a 1043:3f528656855b
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. 171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost.
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete 172 A meta-meta-ExperimentGraph around that that does early-stopping would complete
173 the picture and make a useful boosting implementation. 173 the picture and make a useful boosting implementation.
174 174
175 175
176 Using External Hyper-Parameter Optimization Software
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
178 TODO: use-case - show how we could use the optimizer from
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
180
176 181
177 Implementation Details / API 182 Implementation Details / API
178 ---------------------------- 183 ----------------------------
179 184
180 TODO: PUT IN TERMINOLOGY OF LEARNER, HYPER-LEARNER.
181
182 TODO: SEPARATE DISCUSSION OF PERSISTENT STORAGE FROM LEARNER INTERFACE.
183
184 TODO: API describing hyperparameters (categorical, integer, bounds on values, etc.)
185
186 TODO: use-case - show how we could use the optimizer from
187 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
188
189 ExperimentGraph
190 ~~~~~~~~~~~~~~~
191
192 One API that needs to be defined for this perspective to be practical is the
193 ExperimentGraph. I'll present it in terms of global functions, but an
194 object-oriented things probably makes more sense in the code itself.
195
196
197 def explored_nodes(graph):
198 """Return iterator over explored nodes (ints? objects?)"""
199
200 def forget_nodes(graph, nodes):
201 """Clear the nodes from memory (save space)"""
202
203 def all_edges_from(graph, node):
204 """Return iterator over all possible edges
205
206 Edges might be parametric - like "set learn_rate to (float)"
207
208 Edges might contain a reference to their 'from' end... not sure.
209
210 """
211 def explored_edges_from(graph, node):
212 """Return the edges that have been explored
213 """
214
215 def add_node(graph, new_node):
216 """add a node. It may be serialized."""
217
218 def add_edge(graph, edge):
219 """add edge, it may be serialize"""
220
221 def connect(graph, from_node, to_node, edge):
222 """
223 to_node = None for un-explored edge
224 """
225
226 It makes sense to have one ExperimentGraph implementation for each storage
227 mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc.
228
229 The nodes should be serializable objects (like the 'learner' objects in Yoshua's
230 text above, so that you can do node.learner.predict() if the edge leading to
231 `node` trained something new).
232
233 The nodes could also contain the various costs (train, valid, test), and other
234 experiment statistics that are node-specific.
235
236
237 Some implementations might also include functions for asynchronous updating of
238 the ExperimentGraph:
239
240
241 ExperimentGraphEdge
242 ~~~~~~~~~~~~~~~~~~~
243
244 The ExperimentGraph is primarily a dictionary container for nodes and edges.
245 An ExperimentGraphEdge implementation is the model-dependent component that
246 actually interprets the edges as computations.
247
248 def estimate_compute_time(graph, node, edge):
249 """Return an estimated walltime expense for the computation"""
250
251 def compute_edge(graph, node, edge, async=False, priority=1):
252 """Run the computations assocated with this graph edge, and store the
253 resulting 'to_node' to the graph when complete.
254
255 If async is True, the function doesn't return until the graph is updated
256 with `to_node`.
257
258 The priority is used by implementations that use cluster software or
259 something to manage a worker pool that computes highest-priority edges
260 first.
261
262 """
263
264 def list_compute_queue(graph):
265 """Return edges scheduled for exploration (and maybe a handle for
266 where/when they started running and other backend details)
267 """
268
269 Different implementations of ExperimentGraphExplorer will correspond to
270 different experiments. There can also be ExperimentGraphExplorer
271 implementations that are proxies, and perform the computations in different
272 threads, or across ssh, or cluster software.
273
274
275 Learner 185 Learner
276 ~~~~~~~ 186 ~~~~~~~
277 187 An object that allows us to explore the graph discussed above. Specifically, it represents
278 A learner is a program that implements a policy for graph exploration by 188 an explored node in that graph.
279 exploiting the ExperimentGraph and ExperimentGraphEdge interfaces. 189
280 190 def active_instructions()
281 The convenience of the API hinges on the extent to which we can implement 191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared
282 policies that work on different experiment-graphs (where the labels on the edges 192 to handle.
283 and semantics are different). The use-cases above make me optimistic that it 193 """
284 will work sufficiently well to be worth doing in the absence of better ideas. 194
285 195 def copy(), deepcopy()
286 196 """ Learners should be serializable """
287 197
288 198
199 To make the implementation easier, I found it was helpful to introduce a string-valued
200 `fsa_state` member attribute and associate methods to these states. That made it
201 syntactically easy to build relatively complex finite-state transition graphs to describe
202 which instructions were active at which times in the life-cycle of a learner.
203
204
205 Instruction
206 ~~~~~~~~~~~
207 An object that represents a potential edge in the graph discussed above. It is an
208 operation that a learner can perform.
209
210 arg_types
211 """a list of Type object (see below) indicating what args are required by execute"""
212
213 def execute(learner, args, kwargs):
214 """ Perform some operation on the learner (follow an edge in the graph discussed above)
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in
216 the graph along an edge. To have the old learner as well, it must be copied prior to
217 calling execute().
218 """
219
220 def expense(learner, args, kwargs, resource_type='CPUtime'):
221 """ Return an estimated cost of performing this instruction (calling execute), in time,
222 space, number of computers, disk requierement, etc.
223 """
224
225 Type
226 ~~~~
227 An object that describes a parameter domain for a call to Instruction.execute.
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should
229 `include` all legal arguments, and exclude as many illegal ones as possible.
230
231 def includes(value):
232 """return True if value is a legal argument"""
233
234
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str,
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so
237 that automatic graph exploration algorithms can generate legal arguments with reasonable
238 efficiency.
239
240
241
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner
243 instances also introduce Proxy Instruction classes.
244
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce
248 more powerful instructions and hide simpler ones.
249
250 It is less straightforward, but consistent with the design to implement a Learner that
251 encompasses job management. Such a learner would retain the semantics of the
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools,
254 etc.) Such a learner would replace synchronous instructions (return on completion) with
255 asynchronous ones (return after scheduling) and the active instruction set would also change
256 asynchronously, but neither of these things is inconsistent with the Learner API.
257
258
259 TODO
260 ~~~~
261
262 I feel like something is missing from the API - and that is an interface to the graph structure
263 discussed above. The nodes in this graph are natural places to store meta-information for
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph
265 itself. In other words, there is no API through which to attach information to nodes. It is
266 not good to say that the Learner instance *is* the node because (a) learner instances change
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills
269 over into other committees, so we should get their feedback about how to resolve it.