# HG changeset patch # User Dumitru Erhan # Date 1283969642 14400 # Node ID 0464f891129b9aee5402fbca9ee36a49461a7d25 # Parent 1b61cbe0810ba5aef0f894406cdef72269d7e207# Parent f1732269bce8ef7f07670b74e4cfe8bd2b546204 merge diff -r 1b61cbe0810b -r 0464f891129b doc/v2_planning/learner.txt --- a/doc/v2_planning/learner.txt Wed Sep 08 14:13:43 2010 -0400 +++ b/doc/v2_planning/learner.txt Wed Sep 08 14:14:02 2010 -0400 @@ -173,116 +173,231 @@ the picture and make a useful boosting implementation. +Using External Hyper-Parameter Optimization Software +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +TODO: use-case - show how we could use the optimizer from +http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ + Implementation Details / API ---------------------------- -TODO: PUT IN TERMINOLOGY OF LEARNER, HYPER-LEARNER. - -TODO: SEPARATE DISCUSSION OF PERSISTENT STORAGE FROM LEARNER INTERFACE. - -TODO: API describing hyperparameters (categorical, integer, bounds on values, etc.) - -TODO: use-case - show how we could use the optimizer from - http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/ - -ExperimentGraph -~~~~~~~~~~~~~~~ - -One API that needs to be defined for this perspective to be practical is the -ExperimentGraph. I'll present it in terms of global functions, but an -object-oriented things probably makes more sense in the code itself. - - - def explored_nodes(graph): - """Return iterator over explored nodes (ints? objects?)""" +Learner +~~~~~~~ + An object that allows us to explore the graph discussed above. Specifically, it represents + an explored node in that graph. - def forget_nodes(graph, nodes): - """Clear the nodes from memory (save space)""" - - def all_edges_from(graph, node): - """Return iterator over all possible edges - - Edges might be parametric - like "set learn_rate to (float)" - - Edges might contain a reference to their 'from' end... not sure. - - """ - def explored_edges_from(graph, node): - """Return the edges that have been explored - """ - - def add_node(graph, new_node): - """add a node. It may be serialized.""" - - def add_edge(graph, edge): - """add edge, it may be serialize""" - - def connect(graph, from_node, to_node, edge): - """ - to_node = None for un-explored edge + def active_instructions() + """ Return a list/set of Instruction instances (see below) that the Learner is prepared + to handle. """ -It makes sense to have one ExperimentGraph implementation for each storage -mechanism - Memory, JobMan, sqlite, couchdb, mongodb, etc. - -The nodes should be serializable objects (like the 'learner' objects in Yoshua's -text above, so that you can do node.learner.predict() if the edge leading to -`node` trained something new). - -The nodes could also contain the various costs (train, valid, test), and other -experiment statistics that are node-specific. + def copy(), deepcopy() + """ Learners should be serializable """ -Some implementations might also include functions for asynchronous updating of -the ExperimentGraph: + To make the implementation easier, I found it was helpful to introduce a string-valued + `fsa_state` member attribute and associate methods to these states. That made it + syntactically easy to build relatively complex finite-state transition graphs to describe + which instructions were active at which times in the life-cycle of a learner. -ExperimentGraphEdge -~~~~~~~~~~~~~~~~~~~ - -The ExperimentGraph is primarily a dictionary container for nodes and edges. -An ExperimentGraphEdge implementation is the model-dependent component that -actually interprets the edges as computations. - - def estimate_compute_time(graph, node, edge): - """Return an estimated walltime expense for the computation""" +Instruction +~~~~~~~~~~~ + An object that represents a potential edge in the graph discussed above. It is an + operation that a learner can perform. - def compute_edge(graph, node, edge, async=False, priority=1): - """Run the computations assocated with this graph edge, and store the - resulting 'to_node' to the graph when complete. - - If async is True, the function doesn't return until the graph is updated - with `to_node`. + arg_types + """a list of Type object (see below) indicating what args are required by execute""" - The priority is used by implementations that use cluster software or - something to manage a worker pool that computes highest-priority edges - first. - - """ - - def list_compute_queue(graph): - """Return edges scheduled for exploration (and maybe a handle for - where/when they started running and other backend details) + def execute(learner, args, kwargs): + """ Perform some operation on the learner (follow an edge in the graph discussed above) + and modify the learner in-place. Calling execute 'moves' the learner from one node in + the graph along an edge. To have the old learner as well, it must be copied prior to + calling execute(). """ -Different implementations of ExperimentGraphExplorer will correspond to -different experiments. There can also be ExperimentGraphExplorer -implementations that are proxies, and perform the computations in different -threads, or across ssh, or cluster software. + def expense(learner, args, kwargs, resource_type='CPUtime'): + """ Return an estimated cost of performing this instruction (calling execute), in time, + space, number of computers, disk requierement, etc. + """ + +Type +~~~~ + An object that describes a parameter domain for a call to Instruction.execute. + It is not necessary that a Type specifies exactly which arguments are legal, but it should + `include` all legal arguments, and exclude as many illegal ones as possible. + + def includes(value): + """return True if value is a legal argument""" -Learner -~~~~~~~ - -A learner is a program that implements a policy for graph exploration by -exploiting the ExperimentGraph and ExperimentGraphEdge interfaces. - -The convenience of the API hinges on the extent to which we can implement -policies that work on different experiment-graphs (where the labels on the edges -and semantics are different). The use-cases above make me optimistic that it -will work sufficiently well to be worth doing in the absence of better ideas. + To make things a bit more practical, there are some Type subclasses like Int, Float, Str, + ImageDataset, SgdOptimizer, that include additional attributes (e.g. min, max, default) so + that automatic graph exploration algorithms can generate legal arguments with reasonable + efficiency. +The proxy pattern is a powerful way to combine learners. Especially when proxy Learner +instances also introduce Proxy Instruction classes. +For example, it is straightforward to implement a hyper-learner by implementing a Learner with +another learner (sub-learner) as a member attribute. The hyper-learner makes some +modifications to the instruction_set() return value of the sub-learner, typically to introduce +more powerful instructions and hide simpler ones. + +It is less straightforward, but consistent with the design to implement a Learner that +encompasses job management. Such a learner would retain the semantics of the +instruction_set of the sub-learner, but would replace the Instruction objects themselves with +Instructions that arranged for remote procedure calls (e.g. jobman, multiprocessing, bqtools, +etc.) Such a learner would replace synchronous instructions (return on completion) with +asynchronous ones (return after scheduling) and the active instruction set would also change +asynchronously, but neither of these things is inconsistent with the Learner API. + + +TODO +~~~~ + +I feel like something is missing from the API - and that is an interface to the graph structure +discussed above. The nodes in this graph are natural places to store meta-information for +visualization, statistics-gathering etc. But none of the APIs above corresponds to the graph +itself. In other words, there is no API through which to attach information to nodes. It is +not good to say that the Learner instance *is* the node because (a) learner instances change +during graph exploration and (b) learner instances are big, and we don't want to have to keep a +whole saved model just to attach meta-info e.g. validation score. Choosing this API spills +over into other committees, so we should get their feedback about how to resolve it. + +Comment by OD +~~~~~~~~~~~~~ +(I hope it's ok to leave comments even though I'm not in committee... I'm +interested to see how the learner interface is shaping up so I'll be keeping +an eye on this file) +I'm wondering what's the benefit of such an API compared to simply defining a +new method for each instruction. It seems to me that typically, the 'execute' +method would end up being something like + if instruction == 'do_x': + self.do_x(..) + elif instruction == 'do_y': + self.do_y(..) + ... +so why not directly call do_x / do_y instead? + + +Comment by RP +~~~~~~~~~~~~~ + +James correct me if I'm wrong, but I think each instruction has a execute +command. The job of the learner is to traverse the graph and for each edge +that it decides to cross to call the execute of that edge. Maybe James has +something else in mind, but this was my understanding. + + + +Just another view/spin on the same idea (Razvan) +================================================ + + +My idea is probably just a spin off from what James wrote. It is an extension +of what I send on the mailing list some time ago. + +Big Picture +----------- + +What do we care about ? +~~~~~~~~~~~~~~~~~~~~~~~ + +This is the list of the main points that I have in mind : + + * Re-usability + * Extensibility + * Simplicity or easily readable code ( connected to re-usability ) + * Modularity ( connected to extensibility ) + * Fast to write code ( - sort of comes out of simplicity) + * Efficient code + + +Composition +~~~~~~~~~~~ + +To me this reads as code generated by composing pieces. Imagine this : +you start of with something primitive that I will call a "variable", which +probably is a very unsuitable name. And then you compose those intial +"variables" or transform them through several "functions". Each such +"function" hides some logic, that you as the user don't care about. +You can have low-level or micro "functions" and high-level or macro +"functions", where a high-level function is just a certain compositional +pattern of low-level "functions". There are several classes of "functions" +and "variables" that can be inter-changable. This is how modularity is +obtained, by chainging between functions from a certain class. + +Now when you want to research something, what you do is first select +the step you want to look into. If you are lucky you can re-write this +step as certain decomposition of low-level transformations ( there can be +multiple such decompositions). If not you have to implement such a +decompositions acording to your needs. Pick the low-level transformations you want +to change and write new versions that implement your logic. + +I think the code will be easy to read, because it is just applying a fixed +set of transformations, one after the other. The one who writes the code can +decide how explicit he wants to write things by switching between high-level +and low-level functions. + +I think the code this way is re-usable, because you can just take this chain +of transformation and replace the one you care about, without looking into +the rest. + +You get this fractal property of the code. Zooming in, you always get just +a set of functions applied to a set of variables. In the begining those might +not be there, and you would have to create new "low level" decompositions, +maybe even new "variables" that get data between those decompositions. + +The thing with variables here, is that I don't want this "functions" to have +a state. All the information is passed along through these variables. This +way understanding the graph is easy, debugging it is also easier ( then having +all these hidden states ..) + +Note that while doing so we might ( and I strongly think we should) create +a (symbolic) DAG of operations. ( this is where it becomes what James was saying). +In such a DAG the "variables" will the nodes and the functions will be edges. +I think having a DAG is useful in many ways (all this are things that one +might think about implementing in a far future, I'm not proposing to implement +them unless we want to use them - like the reconstruction ): + * there exist the posibility of writing optimizations ( theano style ) + * there exist the posibility to add global view utility functions ( like + a reconstruction function for SdA - extremely low level here), or global + view diagnostic tools + * the posibility of creating a GUI ( where you just create the Graph by + picking transforms and variables from a list ) or working interactively + and then generating code that will reproduce the graph + * you can view the graph and different granularity levels to understand + things ( global diagnostics) + +We should have a taxonomy of possible classes of functions and possible +classes of variables, but those should not be exclusive. We can work at a high +level for now, and decompose those high level functions to lower level when +we need to. We can introduce new classes of functions or intermediate +variables between those low level functions. + + +Similarities with James' idea +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + As I said before, this is I think just another view on what James proposed. + The learner in his case is the module that traverses the graph of this + operations, which makes sense here as well. + + The 'execute' command in his api is just applying a function to some variables in + my case. + + The learner keeps track of the graph that is formed I think in both cases. + + His view is a bit more general. I see the graph as fully created by the user, + and the learner just has to go from the start to the end. In his case the + traversal is conditioned on some policies. I think these ideas can be mixed / + united. What I would see in my case to have this functionality is something + similar to the lazy linker for Theano. + + + +