Mercurial > pylearn
comparison doc/v2_planning/learner.txt @ 1043:3f528656855b
v2planning learner.txt - updated API recommendation
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 08 Sep 2010 11:33:33 -0400 |
parents | 38cc6e075d9b |
children | 3b1fd599bafd |
comparison
equal
deleted
inserted
replaced
1042:4eaf576c3e9a | 1043:3f528656855b |
---|---|
171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. | 171 straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. |
172 A meta-meta-ExperimentGraph around that that does early-stopping would complete | 172 A meta-meta-ExperimentGraph around that that does early-stopping would complete |
173 the picture and make a useful boosting implementation. | 173 the picture and make a useful boosting implementation. |
174 | 174 |
175 | 175 |
176 Using External Hyper-Parameter Optimization Software | |
177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
178 TODO: use-case - show how we could use the optimizer from | |
179 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ | |
180 | |
176 | 181 |
177 Implementation Details / API | 182 Implementation Details / API |
178 ---------------------------- | 183 ---------------------------- |
179 | 184 |
180 TODO: PUT IN TERMINOLOGY OF LEARNER, HYPER-LEARNER. | |
181 | |
182 TODO: SEPARATE DISCUSSION OF PERSISTENT STORAGE FROM LEARNER INTERFACE. | |
183 | |
184 TODO: API describing hyperparameters (categorical, integer, bounds on values, etc.) | |
185 | |
186 TODO: use-case - show how we could use the optimizer from | |
187 http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ | |
188 | |
189 ExperimentGraph | |
190 ~~~~~~~~~~~~~~~ | |
191 | |
192 One API that needs to be defined for this perspective to be practical is the | |
193 ExperimentGraph. I'll present it in terms of global functions, but an | |
194 object-oriented things probably makes more sense in the code itself. | |
195 | |
196 | |
197 def explored_nodes(graph): | |
198 """Return iterator over explored nodes (ints? objects?)""" | |
199 | |
200 def forget_nodes(graph, nodes): | |
201 """Clear the nodes from memory (save space)""" | |
202 | |
203 def all_edges_from(graph, node): | |
204 """Return iterator over all possible edges | |
205 | |
206 Edges might be parametric - like "set learn_rate to (float)" | |
207 | |
208 Edges might contain a reference to their 'from' end... not sure. | |
209 | |
210 """ | |
211 def explored_edges_from(graph, node): | |
212 """Return the edges that have been explored | |
213 """ | |
214 | |
215 def add_node(graph, new_node): | |
216 """add a node. It may be serialized.""" | |
217 | |
218 def add_edge(graph, edge): | |
219 """add edge, it may be serialize""" | |
220 | |
221 def connect(graph, from_node, to_node, edge): | |
222 """ | |
223 to_node = None for un-explored edge | |
224 """ | |
225 | |
226 It makes sense to have one ExperimentGraph implementation for each storage | |
227 mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc. | |
228 | |
229 The nodes should be serializable objects (like the 'learner' objects in Yoshua's | |
230 text above, so that you can do node.learner.predict() if the edge leading to | |
231 `node` trained something new). | |
232 | |
233 The nodes could also contain the various costs (train, valid, test), and other | |
234 experiment statistics that are node-specific. | |
235 | |
236 | |
237 Some implementations might also include functions for asynchronous updating of | |
238 the ExperimentGraph: | |
239 | |
240 | |
241 ExperimentGraphEdge | |
242 ~~~~~~~~~~~~~~~~~~~ | |
243 | |
244 The ExperimentGraph is primarily a dictionary container for nodes and edges. | |
245 An ExperimentGraphEdge implementation is the model-dependent component that | |
246 actually interprets the edges as computations. | |
247 | |
248 def estimate_compute_time(graph, node, edge): | |
249 """Return an estimated walltime expense for the computation""" | |
250 | |
251 def compute_edge(graph, node, edge, async=False, priority=1): | |
252 """Run the computations assocated with this graph edge, and store the | |
253 resulting 'to_node' to the graph when complete. | |
254 | |
255 If async is True, the function doesn't return until the graph is updated | |
256 with `to_node`. | |
257 | |
258 The priority is used by implementations that use cluster software or | |
259 something to manage a worker pool that computes highest-priority edges | |
260 first. | |
261 | |
262 """ | |
263 | |
264 def list_compute_queue(graph): | |
265 """Return edges scheduled for exploration (and maybe a handle for | |
266 where/when they started running and other backend details) | |
267 """ | |
268 | |
269 Different implementations of ExperimentGraphExplorer will correspond to | |
270 different experiments. There can also be ExperimentGraphExplorer | |
271 implementations that are proxies, and perform the computations in different | |
272 threads, or across ssh, or cluster software. | |
273 | |
274 | |
275 Learner | 185 Learner |
276 ~~~~~~~ | 186 ~~~~~~~ |
277 | 187 An object that allows us to explore the graph discussed above. Specifically, it represents |
278 A learner is a program that implements a policy for graph exploration by | 188 an explored node in that graph. |
279 exploiting the ExperimentGraph and ExperimentGraphEdge interfaces. | 189 |
280 | 190 def active_instructions() |
281 The convenience of the API hinges on the extent to which we can implement | 191 """ Return a list/set of Instruction instances (see below) that the Learner is prepared |
282 policies that work on different experiment-graphs (where the labels on the edges | 192 to handle. |
283 and semantics are different). The use-cases above make me optimistic that it | 193 """ |
284 will work sufficiently well to be worth doing in the absence of better ideas. | 194 |
285 | 195 def copy(), deepcopy() |
286 | 196 """ Learners should be serializable """ |
287 | 197 |
288 | 198 |
199 To make the implementation easier, I found it was helpful to introduce a string-valued | |
200 `fsa_state` member attribute and associate methods to these states. That made it | |
201 syntactically easy to build relatively complex finite-state transition graphs to describe | |
202 which instructions were active at which times in the life-cycle of a learner. | |
203 | |
204 | |
205 Instruction | |
206 ~~~~~~~~~~~ | |
207 An object that represents a potential edge in the graph discussed above. It is an | |
208 operation that a learner can perform. | |
209 | |
210 arg_types | |
211 """a list of Type object (see below) indicating what args are required by execute""" | |
212 | |
213 def execute(learner, args, kwargs): | |
214 """ Perform some operation on the learner (follow an edge in the graph discussed above) | |
215 and modify the learner in-place. Calling execute 'moves' the learner from one node in | |
216 the graph along an edge. To have the old learner as well, it must be copied prior to | |
217 calling execute(). | |
218 """ | |
219 | |
220 def expense(learner, args, kwargs, resource_type='CPUtime'): | |
221 """ Return an estimated cost of performing this instruction (calling execute), in time, | |
222 space, number of computers, disk requierement, etc. | |
223 """ | |
224 | |
225 Type | |
226 ~~~~ | |
227 An object that describes a parameter domain for a call to Instruction.execute. | |
228 It is not necessary that a Type specifies exactly which arguments are legal, but it should | |
229 `include` all legal arguments, and exclude as many illegal ones as possible. | |
230 | |
231 def includes(value): | |
232 """return True if value is a legal argument""" | |
233 | |
234 | |
235 To make things a bit more practical, there are some Type subclasses like Int, Float, Str, | |
236 ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so | |
237 that automatic graph exploration algorithms can generate legal arguments with reasonable | |
238 efficiency. | |
239 | |
240 | |
241 | |
242 The proxy pattern is a powerful way to combine learners. Especially when proxy Learner | |
243 instances also introduce Proxy Instruction classes. | |
244 | |
245 For example, it is straightforward to implement a hyper-learner by implementing a Learner with | |
246 another learner (sub-learner) as a member attribute. The hyper-learner makes some | |
247 modifications to the instruction_set() return value of the sub-learner, typically to introduce | |
248 more powerful instructions and hide simpler ones. | |
249 | |
250 It is less straightforward, but consistent with the design to implement a Learner that | |
251 encompasses job management. Such a learner would retain the semantics of the | |
252 instruction_set of the sub-learner, but would replace the Instruction objects themselves with | |
253 Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools, | |
254 etc.) Such a learner would replace synchronous instructions (return on completion) with | |
255 asynchronous ones (return after scheduling) and the active instruction set would also change | |
256 asynchronously, but neither of these things is inconsistent with the Learner API. | |
257 | |
258 | |
259 TODO | |
260 ~~~~ | |
261 | |
262 I feel like something is missing from the API - and that is an interface to the graph structure | |
263 discussed above. The nodes in this graph are natural places to store meta-information for | |
264 visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph | |
265 itself. In other words, there is no API through which to attach information to nodes. It is | |
266 not good to say that the Learner instance *is* the node because (a) learner instances change | |
267 during graph exploration and (b) learner instances are big, and we don't want to have to keep a | |
268 whole saved model just to attach meta-info e.g. validation score. Choosing this API spills | |
269 over into other committees, so we should get their feedback about how to resolve it. |