pylearn: doc/v2_planning/plugin

comparison doc/v2_planning/plugin_RP.py @ 1154:f923dddf0bf7

a better version of the script

author	pascanur
date	Thu, 16 Sep 2010 23:42:26 -0400
parents	ae5ba6206fd3
children	6993fef088d1 3c2d7c5f0cf7

comparison

equal deleted inserted replaced

-:ae5ba6206fd3
+:f923dddf0bf7
 .. code-block::
 '''
 sch = Schedular()
-p = ProducerFactory()
-p = sched.schedule_plugin(event = every(p.outputStuffs()), p )
-p = sched.schedule_plugin(event = Event("begin"), p)
-c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory )
-pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory )
-sched.run()
+@FnPlugin(sch)
+def producer(self,event):
+self.fire('stuff', value = 'some text')
+@FnPlugin(sch)
+def consumer(self,event):
+print event.value
+@FnPlugin(sch)
+def prod_consumer(self,event):
+print event.value
+self.fire('stuff2', value = 'stuff')
+producer.act( on = Event('begin'), when = once() )
+producer.act( on = Event('stuff'), when = always() )
+consumer.act( on = Event('stuff'), when = always() )
+prod_consumer.act( on = Event('stuff'), when = always() )
+sch.run()
 '''
 Example : Logistic regression
 Possible script
 ---------------
-Sorry for long variable names, I wanted to make it clear what things are ..
+Notes : This would look the same for any other architecture that does not
+imply pre-training ( i.e. deep networks). For example the mlp.
 .. code-block::
 '''
-sched = Schedular()
-# This is a shortcut .. I've been to the dataset committee and they have
-# something else in mind, a bit more fancy; I totally agree with their
-# ideas I just wrote it like this for brevity;
-train_data, valid_data, test_data = load_mnist()
-# This part was not actually discussed into details ; I have my own
+sched = Schedular()
-# opinions of how this part should be done .. but for now I decomposed it
-# in two functions for convinience
+# Data / Model Building :
-logreg = generate_logreg_model()
+# I skiped over how to design this part
+# though I have some ideas
+real_train_data, real_valid_data = load_mnist()
+model = logreg()
+# Main Plugins ( already provided in the library );
+# This wrappers also registers the plugin
+train_data = create_data_plugin( sched, data = real_train_data)
+valid_data = create_data_plugin( sched, data = real_valid_data)
+train_model    = create_train_model(sched, model = model)
+validate_model = create_valid_model(sched, model = model, data = valid_data)
+early_stopper  = create_early_stopper(sched)
+# On the fly plugins ( print random stuff); the main difference from my
-# Note that this is not meant to replace the string idea of Olivier. I
+# FnPlugin from Olivier's version is that it also register the plugin in sched
-# actually think that is a cool idea, when writing things down I realized
+@FnPlugin(sched)
-# it might be a bit more intuitive if you would get that object by calling
+def print_error(self, event):
-# a method of the instance of the plugin with a significant name
+if event.type == Event('begin'):
-# I added a warpping function that sort of tells on which such events
+self.value = []
-# you can have similar to what Olivier wrote { every, at .. }
+elif event.type == train_model.error():
-doOneTrainingStepPlugin =ModelPluginFactory( model = logreg )
+self.value += [event.value]
-trainDataPlugin = sched.schedule_plugin(
+else event.type == train_data.eod():
-event = every(doOneTrainingStepPlugin.new_train_error),
+print 'Error :', numpy.mean(self.value)
-DatasetsPluginFactory( data = train_data) )
-trainDataPlugin = sched.schedule_plugin(
+@FnPlugin(sched)
-event = Event('begin'), trainDataPlugin )
+def save_model(self, event):
+if event.type == early_stopper.new_best_error():
-clock = sched.schedule_plugin( event = all_events, ClockFactory())
+cPickle.dump(model.parameters(), open('best_params.pkl','wb'))
-doOneTrainingStepPlugin = sched.schedule_plugin(
-event = every(trainDataPlugin.new_batch()),
-ModelFactory( model = logreg))
+# Create the dependency graph describing what does what
+train_model.act(on = train_data.batch(), when = always())
+validate_model.act(on = train_model.done(), when = every(n=10000))
+early_stopper.act(on = validate_model.error(), when = always())
+print_error.act( on = train_model.error(), when = always() )
+print_error.act( on = train_data.eod(), when = always() )
+save_model.act( on = eraly_stopper.new_best_errot(), when = always() )
+# Run the entire thing
+sched.run()
-# Arguably we wouldn't need such a plugin. I added just to show how to
-# deal with multiple events from same plugin; the plugin is suppose to
-# reset the index of the dataset to 0, so that you start a new epoch
-resetDataset = sched.schedule_plugin(
-event = every(trainDataPlugin.end_of_dataset()),
-ResetDatasetFactory( data = train_data) )
-checkValidationPlugin = sched.schedule_plugin(
-event =every_nth(doOneTrainingStepPlugin.done(), n=1000),
-ValidationFactory( model = logreg data = valid_data))
-# You have the options to also do :
-#
-# checkValidationPlugin = sched.schedule_plugin(
-#                         event =every(trainDataPlugin.end_of_dataset()),
-#                         ValidationFactory( model = logreg, data = valid_data))
-# checkValidationPlugin = sched.schedule_plugin(
-#                         event =every(clock.hour()),
-#                         ValidationFactory( model = logreg, data = valid_data))
-# This plugin would be responsible to send the Event("terminate") when the
-# patience expired.
-earlyStopperPlugin = sched.schedule_plugin(
-event = every(checkValidationPlugin.new_validation_error()),
-earlyStopperFactory(initial_patience = 10) )
-# Printing & Saving plugins
-printTrainingError = sched.schedule_plugin(
-event = every(doOneTrainingStepPlugin.new_train_error()),
-AggregateAndPrintFactory())
-printTrainingError = sched.schedule_plugin(
-event = every(trainDataPlugin.end_of_dataset()),
-printTrainingError)
-saveWeightsPlugin = sched.schedule_plugin(
-event = every(earlyStopperPlugin.new_best_valid_error()),
-saveWeightsFactory( model = logreg) )
-sched.run()
 '''
 Notes
 =====
-In my code schedule_plugin returns the plugin that it regsiters. I think that
+* I think we should have a FnPlugin decorator ( exactly like Olivier's) just
-writing something like
+that also attaches the new created plugin to the schedule. This way you
-x = f( .. )
+can create plugin on the fly ( as long as they are simple functions that
-y = f(x)
+print stuff, or compute simple statitics ).
+* I added a method act to a Plugin. You use that to create the dependency
+graph ( it could also be named listen to be more plugin like interface)
+* Plugins are obtained in 3 ways  :
+- by wrapping a dataset / model or something similar
+- by a function that constructs it from nothing
+- by decorating a function
+In all cases I would suggest then when creating them you should provide
+the schedular as well, and the constructor also registers the plugin
-makes more readable then writing f( .., event_belongs_to = x), or even worse,
+* The plugin concept works well as long as the plugins are a bit towards
-you only see text, and you would have to go to the plugins to see what events
+heavy duty computation, disregarding printing plugins and such. If you have
-they actually produce.
+many small plugins this system might only introduce an overhead. I would
+argue that using theano is restricted to each plugin. Therefore I would
+strongly suggest that the architecture to be done outside the schedular
+with a different approach.
-At this point I am more concern with how the scripts will look ( the cognitive
+* I would suggest that the framework to be used only for the training loop
-load to understand them) and how easy is to go to hack into them. From this point
+(after you get the adapt function, compute error function) so is more about
-of view I would have the following suggestions :
+the meta-learner, hyper-learner learner level.
-* dataset and model creation should create outside the schedular with possibly
-other mechanisms
+* A general remark that I guess everyone will agree on. We should make
-* there are two types of plugins, those that do not affect the experiment,
+sure that implementing a new plugin is as easy/simple as possible. We
-they just compute statistics and print them, or save different data and those
+have to hide all the complexity in the schedular ( it is the part of the
-plugin that change the state of the model, like train, or influence the life
+code we will not need or we would rarely need to work on).
-of the experiment. There should be a minimum of plugins of the second category,
-to still have the code readable. ( When understanding a script, you only need
+* I have not went into how to implement the different components, but
-to understand that part, the rest you assume is just printing stuff).
+following Olivier's code I think that part would be more or less straight
-The different categories should also be grouped.
+forward.
+'''
 '''

Mercurial > pylearn

comparison doc/v2_planning/plugin_RP.py @ 1154:f923dddf0bf7