# HG changeset patch
# User James Bergstra <bergstrj@iro.umontreal.ca>
# Date 1283553021 14400
# Node ID 38f799f8b6cd366023a2bc980ded13a6c9c5eab7
# Parent  1c96e7ad95c3428c60371283cf8252a08f76b83f
v2_planning - thoughts on learner

diff -r 1c96e7ad95c3 -r 38f799f8b6cd doc/v2_planning/learner.txt
--- a/doc/v2_planning/learner.txt	Fri Sep 03 16:58:25 2010 -0400
+++ b/doc/v2_planning/learner.txt	Fri Sep 03 18:30:21 2010 -0400
@@ -75,4 +75,202 @@
 or other variable-size object (e.g. an image, or a video)
 
 
+
+
+
+
+
         
+
+
+James's idea for Learner Interface
+===================================
+
+Theory:
+-------
+
+Think about the unfolding of a learning algorithm as exploring a path in a vast
+directed graph.
+
+There are some source nodes, which are potential initial conditions for the
+learning algorithm.
+
+At any node, there are a number of outgoing labeled edges that represent
+distinct directions of exploration: like "allocate a model with N hidden units",
+or "set the l1 weight decay on such-and-such units to 0.1" or "adapt for T
+iterations" or "refresh the GPU dataset memory with the next batch of data".
+
+Not all nodes have the same outgoing edge labels.  The dataset, model, and
+optimization algorithm implementations may each have their various
+hyper-parameters with various restrictions on what values they can take, and
+when they can be changed.
+
+Every move in this graph incurs some storage and computational expense, and
+explores the graph.
+
+Learners typically engage in goal-directed exploration of this graph - for
+example to find the node with the best validation-set performance given a
+certain computational budget.  We might often be interested in the best node
+found.
+
+The predict(), log_probability(), free_energy() etc correspond to costs that we
+can measure at any particular node (at some computational expense) to see how we
+are doing in our exploration.
+
+Many semantically distinct components come into the definition of this graph:
+the model (e.g. DAA) the dataset (e.g. an online one), the inference and
+learning strategy.   I'm not sure what to call this graph than an 'experiment
+graph'... so I'll go with that for now.
+
+
+
+
+
+Use Cases
+----------
+
+Early stopping
+~~~~~~~~~~~~~~
+
+Early stopping can be implemented as a learner that progresses along a
+particular kind of edge (e.g. "train more") until a stopping criterion (in terms
+of a cost computed from nodes along the path) is met.
+
+
+Grid Search
+~~~~~~~~~~~
+
+Grid search is a learner policy that can be implemented in an experiment graph
+where all paths have the form:
+
+(   "set param 0 to X",
+    "set param 1 to Y", 
+    ... , 
+    "set param N to Z", 
+    adapt, 
+    [early stop...],
+    test)
+
+It would explore all paths of this form and then return the best node.
+
+
+Stagewise learning of DBNs combined with early stopping and grid search
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This would be a learner that is effective for experiment graphs that reflect the
+greedy-stagewise optimization of DBNs.
+
+
+Boosting
+~~~~~~~~
+
+Given an ExperimentGraph that permits re-weighting of examples, it is
+straightforward to write a meta-ExperimentGraph around it that implements AdaBoost.
+A meta-meta-ExperimentGraph around that that does early-stopping would complete
+the picture and make a useful boosting implementation.
+
+
+
+Implementation Details / API
+----------------------------
+
+ExperimentGraph
+~~~~~~~~~~~~~~~
+
+One API that needs to be defined for this perspective to be practical is the
+ExperimentGraph.  I'll present it in terms of global functions, but an
+object-oriented things probably makes more sense in the code itself.
+
+
+    def explored_nodes(graph):
+       """Return iterator over explored nodes (ints? objects?)"""
+
+    def forget_nodes(graph, nodes):
+       """Clear the nodes from memory (save space)"""
+
+    def all_edges_from(graph, node):
+       """Return iterator over all possible edges
+
+       Edges might be parametric - like "set learn_rate to (float)"
+
+       Edges might contain a reference to their 'from' end... not sure.
+       
+       """
+    def explored_edges_from(graph, node):
+        """Return the edges that have been explored
+        """
+
+    def add_node(graph, new_node):
+        """add a node.  It may be serialized."""
+
+    def add_edge(graph, edge):
+        """add edge, it may be serialize"""
+
+    def connect(graph, from_node, to_node, edge):
+        """
+        to_node = None for un-explored edge
+        """
+
+It makes sense to have one ExperimentGraph implementation for each storage
+mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc.
+
+The nodes should be serializable objects (like the 'learner' objects in Yoshua's
+text above, so that you can do node.learner.predict() if the edge leading to
+`node` trained something new).
+
+The nodes could also contain the various costs (train, valid, test), and other
+experiment statistics that are node-specific.
+
+
+Some implementations might also include functions for asynchronous updating of
+the ExperimentGraph:
+
+
+ExperimentGraphEdge
+~~~~~~~~~~~~~~~~~~~
+
+The ExperimentGraph is primarily a dictionary container for nodes and edges.
+An ExperimentGraphEdge implementation is the model-dependent component that
+actually interprets the edges as computations.
+
+    def estimate_compute_time(graph, node, edge):
+       """Return an estimated walltime expense for the computation"""
+
+    def compute_edge(graph, node, edge, async=False, priority=1):
+       """Run the computations assocated with this graph edge, and store the
+       resulting 'to_node' to the graph when complete.
+
+       If async is True, the function doesn't return until the graph is updated
+       with `to_node`.
+
+       The priority is used by implementations that use cluster software or
+       something to manage a worker pool that computes highest-priority edges
+       first.
+
+       """
+
+    def list_compute_queue(graph):
+        """Return edges scheduled for exploration (and maybe a handle for
+        where/when they started running and other backend details)
+        """
+
+Different implementations of ExperimentGraphExplorer will correspond to
+different experiments.  There can also be ExperimentGraphExplorer
+implementations that are proxies, and perform the computations in different
+threads, or across ssh, or cluster software.
+
+
+Learner
+~~~~~~~
+
+A learner is a program that implements a policy for graph exploration by
+exploiting the ExperimentGraph and ExperimentGraphEdge interfaces.
+
+The convenience of the API hinges on the extent to which we can implement
+policies that work on different experiment-graphs (where the labels on the edges
+and semantics are different).  The use-cases above make me optimistic that it
+will work sufficiently well to be worth doing in the absence of better ideas.
+
+
+
+