view doc/v2_planning/plugin_RP.py @ 1153:ae5ba6206fd3

a first draft of pseudo-code for logreg .. using version B (?) approach
author Razvan Pascanu <r.pascanu@gmail.com>
date Thu, 16 Sep 2010 17:34:30 -0400
parents
children f923dddf0bf7
line wrap: on
line source

'''
=================================================
Plugin system for interative algortithm Version B
=================================================

After the meeting (September 16) we sort of stumbled on 
two possible versions of the plug-in system. This represents
the second version. It suffered a few changes after seeing 
Olivier's code and talking to him.

Concept
=======

The basic idea behind this version is not to have a list of all 
possible events, but rather have plugin register to events.By 
specifying what plugin listens to which event produced by what 
plugin you define a sort of dependency graph. Structuring things
in such a graph might make the script more intuitive when reading.

I will first go through pseudo-code for two example and then enumerate
my insights and concepts on the matter


Example : Producer - Consumer that Guillaume described
======================================================


.. code-block::
'''
    sch = Schedular()
    p = ProducerFactory()
    p = sched.schedule_plugin(event = every(p.outputStuffs()), p )
    p = sched.schedule_plugin(event = Event("begin"), p)
    c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory )
    pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory )

    sched.run()



'''
Example : Logistic regression
=============================

Task description
----------------

Apply a logistic regression network to some dataset. Use early stopping.
Save the weights everytime a new best score is obtained. Print trainnig score 
after each epoch.


Possible script
---------------

 Sorry for long variable names, I wanted to make it clear what things are ..

.. code-block::
'''
    sched = Schedular()
    # This is a shortcut .. I've been to the dataset committee and they have
    # something else in mind, a bit more fancy; I totally agree with their
    # ideas I just wrote it like this for brevity;
    train_data, valid_data, test_data = load_mnist()

    # This part was not actually discussed into details ; I have my own
    # opinions of how this part should be done .. but for now I decomposed it 
    # in two functions for convinience
    logreg = generate_logreg_model()


    
    # Note that this is not meant to replace the string idea of Olivier. I
    # actually think that is a cool idea, when writing things down I realized
    # it might be a bit more intuitive if you would get that object by calling
    # a method of the instance of the plugin with a significant name
    # I added a warpping function that sort of tells on which such events 
    # you can have similar to what Olivier wrote { every, at .. }
    doOneTrainingStepPlugin =ModelPluginFactory( model = logreg )
    trainDataPlugin = sched.schedule_plugin(
                       event = every(doOneTrainingStepPlugin.new_train_error),
                                        DatasetsPluginFactory( data = train_data) )

    trainDataPlugin = sched.schedule_plugin(
                       event = Event('begin'), trainDataPlugin )

    clock = sched.schedule_plugin( event = all_events, ClockFactory())

    doOneTrainingStepPlugin = sched.schedule_plugin( 
                             event = every(trainDataPlugin.new_batch()),
                             ModelFactory( model = logreg))




    # Arguably we wouldn't need such a plugin. I added just to show how to
    # deal with multiple events from same plugin; the plugin is suppose to 
    # reset the index of the dataset to 0, so that you start a new epoch 
    resetDataset = sched.schedule_plugin(
                           event = every(trainDataPlugin.end_of_dataset()),
                           ResetDatasetFactory( data = train_data) )


    checkValidationPlugin = sched.schedule_plugin(
                             event =every_nth(doOneTrainingStepPlugin.done(), n=1000),
                             ValidationFactory( model = logreg data = valid_data))

    # You have the options to also do :
    #
    # checkValidationPlugin = sched.schedule_plugin(
    #                         event =every(trainDataPlugin.end_of_dataset()),
    #                         ValidationFactory( model = logreg, data = valid_data))
    # checkValidationPlugin = sched.schedule_plugin(
    #                         event =every(clock.hour()),
    #                         ValidationFactory( model = logreg, data = valid_data))

    # This plugin would be responsible to send the Event("terminate") when the
    # patience expired.
    earlyStopperPlugin = sched.schedule_plugin(
                            event = every(checkValidationPlugin.new_validation_error()),
                            earlyStopperFactory(initial_patience = 10) )

    # Printing & Saving plugins

    printTrainingError = sched.schedule_plugin(
                            event = every(doOneTrainingStepPlugin.new_train_error()),
                            AggregateAndPrintFactory())

    printTrainingError = sched.schedule_plugin( 
                            event = every(trainDataPlugin.end_of_dataset()),
                            printTrainingError)
    saveWeightsPlugin = sched.schedule_plugin(
                            event = every(earlyStopperPlugin.new_best_valid_error()),
                            saveWeightsFactory( model = logreg) )

    sched.run()

'''
Notes
=====

 In my code schedule_plugin returns the plugin that it regsiters. I think that 
 writing something like 
   x = f( .. ) 
   y = f(x) 

 makes more readable then writing f( .., event_belongs_to = x), or even worse,
 you only see text, and you would have to go to the plugins to see what events 
 they actually produce.

 At this point I am more concern with how the scripts will look ( the cognitive 
 load to understand them) and how easy is to go to hack into them. From this point 
 of view I would have the following suggestions : 
   * dataset and model creation should create outside the schedular with possibly 
   other mechanisms
   * there are two types of plugins, those that do not affect the experiment, 
   they just compute statistics and print them, or save different data and those
   plugin that change the state of the model, like train, or influence the life 
   of the experiment. There should be a minimum of plugins of the second category,
   to still have the code readable. ( When understanding a script, you only need 
   to understand that part, the rest you assume is just printing stuff). 
   The different categories should also be grouped.


'''