diff doc/v2_planning/learner.txt @ 1043:3f528656855b

v2planning learner.txt - updated API recommendation
author James Bergstra <bergstrj@iro.umontreal.ca>
date Wed, 08 Sep 2010 11:33:33 -0400
parents 38cc6e075d9b
children 3b1fd599bafd
line wrap: on
line diff
--- a/doc/v2_planning/learner.txt	Wed Sep 08 11:18:00 2010 -0400
+++ b/doc/v2_planning/learner.txt	Wed Sep 08 11:33:33 2010 -0400
@@ -173,116 +173,97 @@
 the picture and make a useful boosting implementation.
 
 
+Using External Hyper-Parameter Optimization Software
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+TODO: use-case - show how we could use the optimizer from
+http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
+
 
 Implementation Details / API
 ----------------------------
 
-TODO: PUT IN TERMINOLOGY OF LEARNER, HYPER-LEARNER.
-
-TODO: SEPARATE DISCUSSION OF PERSISTENT STORAGE FROM LEARNER INTERFACE.
-
-TODO: API describing hyperparameters (categorical, integer, bounds on values, etc.)
-
-TODO: use-case - show how we could use the optimizer from
-      http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/
-
-ExperimentGraph
-~~~~~~~~~~~~~~~
-
-One API that needs to be defined for this perspective to be practical is the
-ExperimentGraph.  I'll present it in terms of global functions, but an
-object-oriented things probably makes more sense in the code itself.
-
-
-    def explored_nodes(graph):
-       """Return iterator over explored nodes (ints? objects?)"""
+Learner
+~~~~~~~
+    An object that allows us to explore the graph discussed above.  Specifically, it represents
+    an explored node in that graph.
 
-    def forget_nodes(graph, nodes):
-       """Clear the nodes from memory (save space)"""
-
-    def all_edges_from(graph, node):
-       """Return iterator over all possible edges
-
-       Edges might be parametric - like "set learn_rate to (float)"
-
-       Edges might contain a reference to their 'from' end... not sure.
-       
-       """
-    def explored_edges_from(graph, node):
-        """Return the edges that have been explored
-        """
-
-    def add_node(graph, new_node):
-        """add a node.  It may be serialized."""
-
-    def add_edge(graph, edge):
-        """add edge, it may be serialize"""
-
-    def connect(graph, from_node, to_node, edge):
-        """
-        to_node = None for un-explored edge
+    def active_instructions()
+        """ Return a list/set of Instruction instances (see below) that the Learner is prepared
+        to handle.
         """
 
-It makes sense to have one ExperimentGraph implementation for each storage
-mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc.
-
-The nodes should be serializable objects (like the 'learner' objects in Yoshua's
-text above, so that you can do node.learner.predict() if the edge leading to
-`node` trained something new).
-
-The nodes could also contain the various costs (train, valid, test), and other
-experiment statistics that are node-specific.
+    def copy(), deepcopy()
+        """ Learners should be serializable """
 
 
-Some implementations might also include functions for asynchronous updating of
-the ExperimentGraph:
+    To make the implementation easier, I found it was helpful to introduce a string-valued
+    `fsa_state` member attribute and associate methods to these states.  That made it
+    syntactically easy to build relatively complex finite-state transition graphs to describe
+    which instructions were active at which times in the life-cycle of a learner.
 
 
-ExperimentGraphEdge
-~~~~~~~~~~~~~~~~~~~
-
-The ExperimentGraph is primarily a dictionary container for nodes and edges.
-An ExperimentGraphEdge implementation is the model-dependent component that
-actually interprets the edges as computations.
-
-    def estimate_compute_time(graph, node, edge):
-       """Return an estimated walltime expense for the computation"""
+Instruction
+~~~~~~~~~~~
+    An object that represents a potential edge in the graph discussed above.  It is an
+    operation that a learner can perform.
 
-    def compute_edge(graph, node, edge, async=False, priority=1):
-       """Run the computations assocated with this graph edge, and store the
-       resulting 'to_node' to the graph when complete.
-
-       If async is True, the function doesn't return until the graph is updated
-       with `to_node`.
+    arg_types
+        """a list of Type object (see below) indicating what args are required by execute"""
 
-       The priority is used by implementations that use cluster software or
-       something to manage a worker pool that computes highest-priority edges
-       first.
-
-       """
-
-    def list_compute_queue(graph):
-        """Return edges scheduled for exploration (and maybe a handle for
-        where/when they started running and other backend details)
+    def execute(learner, args, kwargs):
+        """ Perform some operation on the learner (follow an edge in the graph discussed above)
+        and modify the learner in-place.  Calling execute 'moves' the learner from one node in
+        the graph along an edge.  To have the old learner as well, it must be copied prior to
+        calling execute().
         """
 
-Different implementations of ExperimentGraphExplorer will correspond to
-different experiments.  There can also be ExperimentGraphExplorer
-implementations that are proxies, and perform the computations in different
-threads, or across ssh, or cluster software.
+    def expense(learner, args, kwargs, resource_type='CPUtime'):
+        """ Return an estimated cost of performing this instruction (calling execute), in time,
+        space, number of computers, disk requierement, etc.
+        """
+
+Type
+~~~~
+    An object that describes a parameter domain for a call to Instruction.execute.
+    It is not necessary that a Type specifies exactly which arguments are legal, but it should
+    `include` all legal arguments, and exclude as many illegal ones as possible.
+
+    def includes(value):
+        """return True if value is a legal argument"""
 
 
-Learner
-~~~~~~~
-
-A learner is a program that implements a policy for graph exploration by
-exploiting the ExperimentGraph and ExperimentGraphEdge interfaces.
-
-The convenience of the API hinges on the extent to which we can implement
-policies that work on different experiment-graphs (where the labels on the edges
-and semantics are different).  The use-cases above make me optimistic that it
-will work sufficiently well to be worth doing in the absence of better ideas.
+    To make things a bit more practical, there are some Type subclasses like Int, Float, Str,
+    ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so
+    that automatic graph exploration algorithms can generate legal arguments with reasonable
+    efficiency.
 
 
 
+The proxy pattern is a powerful way to combine learners. Especially when proxy Learner
+instances also introduce Proxy Instruction classes.
 
+For example, it is straightforward to implement a hyper-learner by implementing a Learner with
+another learner (sub-learner) as a member attribute.  The hyper-learner makes some
+modifications to the instruction_set() return value of the sub-learner, typically to introduce
+more powerful instructions and hide simpler ones.
+
+It is less straightforward, but consistent with the design to implement a Learner that
+encompasses job management.  Such a learner would retain the semantics of the
+instruction_set of the sub-learner, but would replace the Instruction objects themselves with
+Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools,
+etc.)  Such a learner would replace synchronous instructions (return on completion) with
+asynchronous ones (return after scheduling) and the active instruction set would also change
+asynchronously, but neither of these things is inconsistent with the Learner API.
+
+
+TODO
+~~~~
+
+I feel like something is missing from the API - and that is an interface to the graph structure
+discussed above.  The nodes in this graph are natural places to store meta-information for
+visualization, statistics-gathering etc.   But none of the APIs above corresponds to the graph
+itself. In other words, there is no API through which to attach information to nodes.  It is
+not good to say that the Learner instance *is* the node because (a) learner instances change
+during graph exploration and (b) learner instances are big, and we don't want to have to keep a
+whole saved model just to attach meta-info e.g. validation score.    Choosing this API spills
+over into other committees, so we should get their feedback about how to resolve it.