comparison doc/v2_planning/plugin_RP.py @ 1154:f923dddf0bf7

a better version of the script
author pascanur
date Thu, 16 Sep 2010 23:42:26 -0400
parents ae5ba6206fd3
children 6993fef088d1 3c2d7c5f0cf7
comparison
equal deleted inserted replaced
1153:ae5ba6206fd3 1154:f923dddf0bf7
26 26
27 27
28 .. code-block:: 28 .. code-block::
29 ''' 29 '''
30 sch = Schedular() 30 sch = Schedular()
31 p = ProducerFactory()
32 p = sched.schedule_plugin(event = every(p.outputStuffs()), p )
33 p = sched.schedule_plugin(event = Event("begin"), p)
34 c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory )
35 pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory )
36 31
37 sched.run() 32 @FnPlugin(sch)
33 def producer(self,event):
34 self.fire('stuff', value = 'some text')
35
36 @FnPlugin(sch)
37 def consumer(self,event):
38 print event.value
39
40 @FnPlugin(sch)
41 def prod_consumer(self,event):
42 print event.value
43 self.fire('stuff2', value = 'stuff')
44
45 producer.act( on = Event('begin'), when = once() )
46 producer.act( on = Event('stuff'), when = always() )
47 consumer.act( on = Event('stuff'), when = always() )
48 prod_consumer.act( on = Event('stuff'), when = always() )
49
50 sch.run()
38 51
39 52
40 53
41 ''' 54 '''
42 Example : Logistic regression 55 Example : Logistic regression
51 64
52 65
53 Possible script 66 Possible script
54 --------------- 67 ---------------
55 68
56 Sorry for long variable names, I wanted to make it clear what things are .. 69 Notes : This would look the same for any other architecture that does not
70 imply pre-training ( i.e. deep networks). For example the mlp.
57 71
58 .. code-block:: 72 .. code-block::
59 ''' 73 '''
60 sched = Schedular()
61 # This is a shortcut .. I've been to the dataset committee and they have
62 # something else in mind, a bit more fancy; I totally agree with their
63 # ideas I just wrote it like this for brevity;
64 train_data, valid_data, test_data = load_mnist()
65 74
66 # This part was not actually discussed into details ; I have my own 75 sched = Schedular()
67 # opinions of how this part should be done .. but for now I decomposed it 76
68 # in two functions for convinience 77 # Data / Model Building :
69 logreg = generate_logreg_model() 78 # I skiped over how to design this part
79 # though I have some ideas
80 real_train_data, real_valid_data = load_mnist()
81 model = logreg()
82
83 # Main Plugins ( already provided in the library );
84 # This wrappers also registers the plugin
85 train_data = create_data_plugin( sched, data = real_train_data)
86 valid_data = create_data_plugin( sched, data = real_valid_data)
87 train_model = create_train_model(sched, model = model)
88 validate_model = create_valid_model(sched, model = model, data = valid_data)
89 early_stopper = create_early_stopper(sched)
70 90
71 91
72 92 # On the fly plugins ( print random stuff); the main difference from my
73 # Note that this is not meant to replace the string idea of Olivier. I 93 # FnPlugin from Olivier's version is that it also register the plugin in sched
74 # actually think that is a cool idea, when writing things down I realized 94 @FnPlugin(sched)
75 # it might be a bit more intuitive if you would get that object by calling 95 def print_error(self, event):
76 # a method of the instance of the plugin with a significant name 96 if event.type == Event('begin'):
77 # I added a warpping function that sort of tells on which such events 97 self.value = []
78 # you can have similar to what Olivier wrote { every, at .. } 98 elif event.type == train_model.error():
79 doOneTrainingStepPlugin =ModelPluginFactory( model = logreg ) 99 self.value += [event.value]
80 trainDataPlugin = sched.schedule_plugin( 100 else event.type == train_data.eod():
81 event = every(doOneTrainingStepPlugin.new_train_error), 101 print 'Error :', numpy.mean(self.value)
82 DatasetsPluginFactory( data = train_data) )
83 102
84 trainDataPlugin = sched.schedule_plugin( 103 @FnPlugin(sched)
85 event = Event('begin'), trainDataPlugin ) 104 def save_model(self, event):
86 105 if event.type == early_stopper.new_best_error():
87 clock = sched.schedule_plugin( event = all_events, ClockFactory()) 106 cPickle.dump(model.parameters(), open('best_params.pkl','wb'))
88
89 doOneTrainingStepPlugin = sched.schedule_plugin(
90 event = every(trainDataPlugin.new_batch()),
91 ModelFactory( model = logreg))
92 107
93 108
109 # Create the dependency graph describing what does what
110 train_model.act(on = train_data.batch(), when = always())
111 validate_model.act(on = train_model.done(), when = every(n=10000))
112 early_stopper.act(on = validate_model.error(), when = always())
113 print_error.act( on = train_model.error(), when = always() )
114 print_error.act( on = train_data.eod(), when = always() )
115 save_model.act( on = eraly_stopper.new_best_errot(), when = always() )
94 116
117 # Run the entire thing
118 sched.run()
95 119
96 # Arguably we wouldn't need such a plugin. I added just to show how to
97 # deal with multiple events from same plugin; the plugin is suppose to
98 # reset the index of the dataset to 0, so that you start a new epoch
99 resetDataset = sched.schedule_plugin(
100 event = every(trainDataPlugin.end_of_dataset()),
101 ResetDatasetFactory( data = train_data) )
102
103
104 checkValidationPlugin = sched.schedule_plugin(
105 event =every_nth(doOneTrainingStepPlugin.done(), n=1000),
106 ValidationFactory( model = logreg data = valid_data))
107
108 # You have the options to also do :
109 #
110 # checkValidationPlugin = sched.schedule_plugin(
111 # event =every(trainDataPlugin.end_of_dataset()),
112 # ValidationFactory( model = logreg, data = valid_data))
113 # checkValidationPlugin = sched.schedule_plugin(
114 # event =every(clock.hour()),
115 # ValidationFactory( model = logreg, data = valid_data))
116
117 # This plugin would be responsible to send the Event("terminate") when the
118 # patience expired.
119 earlyStopperPlugin = sched.schedule_plugin(
120 event = every(checkValidationPlugin.new_validation_error()),
121 earlyStopperFactory(initial_patience = 10) )
122
123 # Printing & Saving plugins
124
125 printTrainingError = sched.schedule_plugin(
126 event = every(doOneTrainingStepPlugin.new_train_error()),
127 AggregateAndPrintFactory())
128
129 printTrainingError = sched.schedule_plugin(
130 event = every(trainDataPlugin.end_of_dataset()),
131 printTrainingError)
132 saveWeightsPlugin = sched.schedule_plugin(
133 event = every(earlyStopperPlugin.new_best_valid_error()),
134 saveWeightsFactory( model = logreg) )
135
136 sched.run()
137 120
138 ''' 121 '''
139 Notes 122 Notes
140 ===== 123 =====
141 124
142 In my code schedule_plugin returns the plugin that it regsiters. I think that 125 * I think we should have a FnPlugin decorator ( exactly like Olivier's) just
143 writing something like 126 that also attaches the new created plugin to the schedule. This way you
144 x = f( .. ) 127 can create plugin on the fly ( as long as they are simple functions that
145 y = f(x) 128 print stuff, or compute simple statitics ).
129 * I added a method act to a Plugin. You use that to create the dependency
130 graph ( it could also be named listen to be more plugin like interface)
131 * Plugins are obtained in 3 ways :
132 - by wrapping a dataset / model or something similar
133 - by a function that constructs it from nothing
134 - by decorating a function
135 In all cases I would suggest then when creating them you should provide
136 the schedular as well, and the constructor also registers the plugin
146 137
147 makes more readable then writing f( .., event_belongs_to = x), or even worse, 138 * The plugin concept works well as long as the plugins are a bit towards
148 you only see text, and you would have to go to the plugins to see what events 139 heavy duty computation, disregarding printing plugins and such. If you have
149 they actually produce. 140 many small plugins this system might only introduce an overhead. I would
141 argue that using theano is restricted to each plugin. Therefore I would
142 strongly suggest that the architecture to be done outside the schedular
143 with a different approach.
150 144
151 At this point I am more concern with how the scripts will look ( the cognitive 145 * I would suggest that the framework to be used only for the training loop
152 load to understand them) and how easy is to go to hack into them. From this point 146 (after you get the adapt function, compute error function) so is more about
153 of view I would have the following suggestions : 147 the meta-learner, hyper-learner learner level.
154 * dataset and model creation should create outside the schedular with possibly 148
155 other mechanisms 149 * A general remark that I guess everyone will agree on. We should make
156 * there are two types of plugins, those that do not affect the experiment, 150 sure that implementing a new plugin is as easy/simple as possible. We
157 they just compute statistics and print them, or save different data and those 151 have to hide all the complexity in the schedular ( it is the part of the
158 plugin that change the state of the model, like train, or influence the life 152 code we will not need or we would rarely need to work on).
159 of the experiment. There should be a minimum of plugins of the second category, 153
160 to still have the code readable. ( When understanding a script, you only need 154 * I have not went into how to implement the different components, but
161 to understand that part, the rest you assume is just printing stuff). 155 following Olivier's code I think that part would be more or less straight
162 The different categories should also be grouped. 156 forward.
157
158 '''
163 159
164 160
165 ''' 161 '''