view doc/v2_planning/plugin_RP.py @ 1178:10bc5ebb5823

coding_style: Added note about the need to provide guidelines for serialization-friendly code
author Olivier Delalleau <delallea@iro>
date Fri, 17 Sep 2010 16:21:55 -0400
parents 6993fef088d1
children fe6c25eb1e37
line wrap: on
line source

'''
=================================================
Plugin system for interative algortithm Version B
=================================================

After the meeting (September 16) we sort of stumbled on 
two possible versions of the plug-in system. This represents
the second version. It suffered a few changes after seeing 
Olivier's code and talking to him.

Concept
=======

The basic idea behind this version is not to have a list of all 
possible events, but rather have plugin register to events.By 
specifying what plugin listens to which event produced by what 
plugin you define a sort of dependency graph. Structuring things
in such a graph might make the script more intuitive when reading.

I will first go through pseudo-code for two example and then enumerate
my insights and concepts on the matter


Example : Producer - Consumer that Guillaume described
======================================================


.. code-block::
'''
    sch = Schedular()

    @FnPlugin(sch)
    def producer(self,event):
        self.fire('stuff', value = 'some text')

    @FnPlugin(sch)
    def consumer(self,event):
        print event.value

    @FnPlugin(sch)
    def prod_consumer(self,event):
        print event.value
        self.fire('stuff2', value = 'stuff')

    producer.act( on = Event('begin'), when = once() )
    producer.act( on = Event('stuff'), when = always() )
    consumer.act( on = Event('stuff'), when = always() )
    prod_consumer.act( on = Event('stuff'), when = always() )

    sch.run()



'''
Example : Logistic regression
=============================

Task description
----------------

Apply a logistic regression network to some dataset. Use early stopping.
Save the weights everytime a new best score is obtained. Print trainnig score 
after each epoch.


Possible script
---------------

Notes : This would look the same for any other architecture that does not
imply pre-training ( i.e. deep networks). For example the mlp.

.. code-block::
'''

sched = Schedular()

# Data / Model Building : 
# I skiped over how to design this part
# though I have some ideas
real_train_data, real_valid_data = load_mnist()
model = logreg()

# Main Plugins ( already provided in the library ); 
# This wrappers also registers the plugin
valid_data = create_data_plugin( sched, data = real_valid_data)
train_model    = create_train_model(sched, model = model)
validate_model = create_valid_model(sched, model = model, data = valid_data)
early_stopper  = create_early_stopper(sched)


# On the fly plugins ( print random stuff); the main difference from my 
# FnPlugin from Olivier's version is that it also register the plugin in sched
@FnPlugin(sched)
def print_error(self, event):
    if event.type == Event('begin'):
        self.value = []
    elif event.type == train_model.error():
        self.value += [event.value]
    else event.type == train_data.eod():
        print 'Error :', numpy.mean(self.value)

@FnPlugin(sched)
def save_model(self, event):
    if event.type == early_stopper.new_best_error():
        cPickle.dump(model.parameters(), open('best_params.pkl','wb'))


# Create the dependency graph describing what does what 
train_model.act(on = train_data.batch(), when = always())
validate_model.act(on = train_model.done(), when = every(n=10000)) 
early_stopper.act(on = validate_model.error(), when = always())
print_error.act( on = train_model.error(), when = always() )
print_error.act( on = train_data.eod(), when = always() )
save_model.act( on = eraly_stopper.new_best_errot(), when = always() )

# Run the entire thing
sched.run()


'''
Notes
=====

 * I think we should have a FnPlugin decorator ( exactly like Olivier's) just
 that also attaches the new created plugin to the schedule. This way you 
 can create plugin on the fly ( as long as they are simple functions that
 print stuff, or compute simple statitics ).
 * I added a method act to a Plugin. You use that to create the dependency
 graph ( it could also be named listen to be more plugin like interface)
 * Plugins are obtained in 3 ways  :
     - by wrapping a dataset / model or something similar
     - by a function that constructs it from nothing
     - by decorating a function
   In all cases I would suggest then when creating them you should provide
   the schedular as well, and the constructor also registers the plugin

 * The plugin concept works well as long as the plugins are a bit towards
 heavy duty computation, disregarding printing plugins and such. If you have
 many small plugins this system might only introduce an overhead. I would 
 argue that using theano is restricted to each plugin. Therefore I would
 strongly suggest that the architecture to be done outside the schedular
 with a different approach.

 * I would suggest that the framework to be used only for the training loop
 (after you get the adapt function, compute error function) so is more about
 the meta-learner, hyper-learner learner level.

 * A general remark that I guess everyone will agree on. We should make 
 sure that implementing a new plugin is as easy/simple as possible. We 
 have to hide all the complexity in the schedular ( it is the part of the 
 code we will not need or we would rarely need to work on). 

 * I have not went into how to implement the different components, but 
 following Olivier's code I think that part would be more or less straight
 forward. 

 '''


'''