Mercurial > pylearn
diff doc/v2_planning/learner.txt @ 1026:38f799f8b6cd
v2_planning - thoughts on learner
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Fri, 03 Sep 2010 18:30:21 -0400 |
parents | f82093bf4405 |
children | 88b296cfba50 |
line wrap: on
line diff
--- a/doc/v2_planning/learner.txt Fri Sep 03 16:58:25 2010 -0400 +++ b/doc/v2_planning/learner.txt Fri Sep 03 18:30:21 2010 -0400 @@ -75,4 +75,202 @@ or other variable-size object (e.g. an image, or a video) + + + + + + + +James's idea for Learner Interface +=================================== + +Theory: +------- + +Think about the unfolding of a learning algorithm as exploring a path in a vast +directed graph. + +There are some source nodes, which are potential initial conditions for the +learning algorithm. + +At any node, there are a number of outgoing labeled edges that represent +distinct directions of exploration: like "allocate a model with N hidden units", +or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T +iterations" or "refresh the GPU dataset memory with the next batch of data". + +Not all nodes have the same outgoing edge labels. The dataset, model, and +optimization algorithm implementations may each have their various +hyper-parameters with various restrictions on what values they can take, and +when they can be changed. + +Every move in this graph incurs some storage and computational expense, and +explores the graph. + +Learners typically engage in goal-directed exploration of this graph - for +example to find the node with the best validation-set performance given a +certain computational budget. We might often be interested in the best node +found. + +The predict(), log_probability(), free_energy() etc correspond to costs that we +can measure at any particular node (at some computational expense) to see how we +are doing in our exploration. + +Many semantically distinct components come into the definition of this graph: +the model (e.g. DAA) the dataset (e.g. an online one), the inference and +learning strategy. I'm not sure what to call this graph than an 'experiment +graph'... so I'll go with that for now. + + + + + +Use Cases +---------- + +Early stopping +~~~~~~~~~~~~~~ + +Early stopping can be implemented as a learner that progresses along a +particular kind of edge (e.g. "train more") until a stopping criterion (in terms +of a cost computed from nodes along the path) is met. + + +Grid Search +~~~~~~~~~~~ + +Grid search is a learner policy that can be implemented in an experiment graph +where all paths have the form: + +( "set param 0 to X", + "set param 1 to Y", + ... , + "set param N to Z", + adapt, + [early stop...], + test) + +It would explore all paths of this form and then return the best node. + + +Stagewise learning of DBNs combined with early stopping and grid search +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This would be a learner that is effective for experiment graphs that reflect the +greedy-stagewise optimization of DBNs. + + +Boosting +~~~~~~~~ + +Given an ExperimentGraph that permits re-weighting of examples, it is +straightforward to write a meta-ExperimentGraph around it that implements AdaBoost. +A meta-meta-ExperimentGraph around that that does early-stopping would complete +the picture and make a useful boosting implementation. + + + +Implementation Details / API +---------------------------- + +ExperimentGraph +~~~~~~~~~~~~~~~ + +One API that needs to be defined for this perspective to be practical is the +ExperimentGraph. I'll present it in terms of global functions, but an +object-oriented things probably makes more sense in the code itself. + + + def explored_nodes(graph): + """Return iterator over explored nodes (ints? objects?)""" + + def forget_nodes(graph, nodes): + """Clear the nodes from memory (save space)""" + + def all_edges_from(graph, node): + """Return iterator over all possible edges + + Edges might be parametric - like "set learn_rate to (float)" + + Edges might contain a reference to their 'from' end... not sure. + + """ + def explored_edges_from(graph, node): + """Return the edges that have been explored + """ + + def add_node(graph, new_node): + """add a node. It may be serialized.""" + + def add_edge(graph, edge): + """add edge, it may be serialize""" + + def connect(graph, from_node, to_node, edge): + """ + to_node = None for un-explored edge + """ + +It makes sense to have one ExperimentGraph implementation for each storage +mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc. + +The nodes should be serializable objects (like the 'learner' objects in Yoshua's +text above, so that you can do node.learner.predict() if the edge leading to +`node` trained something new). + +The nodes could also contain the various costs (train, valid, test), and other +experiment statistics that are node-specific. + + +Some implementations might also include functions for asynchronous updating of +the ExperimentGraph: + + +ExperimentGraphEdge +~~~~~~~~~~~~~~~~~~~ + +The ExperimentGraph is primarily a dictionary container for nodes and edges. +An ExperimentGraphEdge implementation is the model-dependent component that +actually interprets the edges as computations. + + def estimate_compute_time(graph, node, edge): + """Return an estimated walltime expense for the computation""" + + def compute_edge(graph, node, edge, async=False, priority=1): + """Run the computations assocated with this graph edge, and store the + resulting 'to_node' to the graph when complete. + + If async is True, the function doesn't return until the graph is updated + with `to_node`. + + The priority is used by implementations that use cluster software or + something to manage a worker pool that computes highest-priority edges + first. + + """ + + def list_compute_queue(graph): + """Return edges scheduled for exploration (and maybe a handle for + where/when they started running and other backend details) + """ + +Different implementations of ExperimentGraphExplorer will correspond to +different experiments. There can also be ExperimentGraphExplorer +implementations that are proxies, and perform the computations in different +threads, or across ssh, or cluster software. + + +Learner +~~~~~~~ + +A learner is a program that implements a policy for graph exploration by +exploiting the ExperimentGraph and ExperimentGraphEdge interfaces. + +The convenience of the API hinges on the extent to which we can implement +policies that work on different experiment-graphs (where the labels on the edges +and semantics are different). The use-cases above make me optimistic that it +will work sufficiently well to be worth doing in the absence of better ideas. + + + +