Mercurial > pylearn
view doc/v2_planning/plugin_RP.py @ 1153:ae5ba6206fd3
a first draft of pseudo-code for logreg .. using version B (?) approach
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Thu, 16 Sep 2010 17:34:30 -0400 |
parents | |
children | f923dddf0bf7 |
line wrap: on
line source
''' ================================================= Plugin system for interative algortithm Version B ================================================= After the meeting (September 16) we sort of stumbled on two possible versions of the plug-in system. This represents the second version. It suffered a few changes after seeing Olivier's code and talking to him. Concept ======= The basic idea behind this version is not to have a list of all possible events, but rather have plugin register to events.By specifying what plugin listens to which event produced by what plugin you define a sort of dependency graph. Structuring things in such a graph might make the script more intuitive when reading. I will first go through pseudo-code for two example and then enumerate my insights and concepts on the matter Example : Producer - Consumer that Guillaume described ====================================================== .. code-block:: ''' sch = Schedular() p = ProducerFactory() p = sched.schedule_plugin(event = every(p.outputStuffs()), p ) p = sched.schedule_plugin(event = Event("begin"), p) c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory ) pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory ) sched.run() ''' Example : Logistic regression ============================= Task description ---------------- Apply a logistic regression network to some dataset. Use early stopping. Save the weights everytime a new best score is obtained. Print trainnig score after each epoch. Possible script --------------- Sorry for long variable names, I wanted to make it clear what things are .. .. code-block:: ''' sched = Schedular() # This is a shortcut .. I've been to the dataset committee and they have # something else in mind, a bit more fancy; I totally agree with their # ideas I just wrote it like this for brevity; train_data, valid_data, test_data = load_mnist() # This part was not actually discussed into details ; I have my own # opinions of how this part should be done .. but for now I decomposed it # in two functions for convinience logreg = generate_logreg_model() # Note that this is not meant to replace the string idea of Olivier. I # actually think that is a cool idea, when writing things down I realized # it might be a bit more intuitive if you would get that object by calling # a method of the instance of the plugin with a significant name # I added a warpping function that sort of tells on which such events # you can have similar to what Olivier wrote { every, at .. } doOneTrainingStepPlugin =ModelPluginFactory( model = logreg ) trainDataPlugin = sched.schedule_plugin( event = every(doOneTrainingStepPlugin.new_train_error), DatasetsPluginFactory( data = train_data) ) trainDataPlugin = sched.schedule_plugin( event = Event('begin'), trainDataPlugin ) clock = sched.schedule_plugin( event = all_events, ClockFactory()) doOneTrainingStepPlugin = sched.schedule_plugin( event = every(trainDataPlugin.new_batch()), ModelFactory( model = logreg)) # Arguably we wouldn't need such a plugin. I added just to show how to # deal with multiple events from same plugin; the plugin is suppose to # reset the index of the dataset to 0, so that you start a new epoch resetDataset = sched.schedule_plugin( event = every(trainDataPlugin.end_of_dataset()), ResetDatasetFactory( data = train_data) ) checkValidationPlugin = sched.schedule_plugin( event =every_nth(doOneTrainingStepPlugin.done(), n=1000), ValidationFactory( model = logreg data = valid_data)) # You have the options to also do : # # checkValidationPlugin = sched.schedule_plugin( # event =every(trainDataPlugin.end_of_dataset()), # ValidationFactory( model = logreg, data = valid_data)) # checkValidationPlugin = sched.schedule_plugin( # event =every(clock.hour()), # ValidationFactory( model = logreg, data = valid_data)) # This plugin would be responsible to send the Event("terminate") when the # patience expired. earlyStopperPlugin = sched.schedule_plugin( event = every(checkValidationPlugin.new_validation_error()), earlyStopperFactory(initial_patience = 10) ) # Printing & Saving plugins printTrainingError = sched.schedule_plugin( event = every(doOneTrainingStepPlugin.new_train_error()), AggregateAndPrintFactory()) printTrainingError = sched.schedule_plugin( event = every(trainDataPlugin.end_of_dataset()), printTrainingError) saveWeightsPlugin = sched.schedule_plugin( event = every(earlyStopperPlugin.new_best_valid_error()), saveWeightsFactory( model = logreg) ) sched.run() ''' Notes ===== In my code schedule_plugin returns the plugin that it regsiters. I think that writing something like x = f( .. ) y = f(x) makes more readable then writing f( .., event_belongs_to = x), or even worse, you only see text, and you would have to go to the plugins to see what events they actually produce. At this point I am more concern with how the scripts will look ( the cognitive load to understand them) and how easy is to go to hack into them. From this point of view I would have the following suggestions : * dataset and model creation should create outside the schedular with possibly other mechanisms * there are two types of plugins, those that do not affect the experiment, they just compute statistics and print them, or save different data and those plugin that change the state of the model, like train, or influence the life of the experiment. There should be a minimum of plugins of the second category, to still have the code readable. ( When understanding a script, you only need to understand that part, the rest you assume is just printing stuff). The different categories should also be grouped. '''