comparison doc/v2_planning/plugin_RP.py @ 1153:ae5ba6206fd3

a first draft of pseudo-code for logreg .. using version B (?) approach
author Razvan Pascanu <r.pascanu@gmail.com>
date Thu, 16 Sep 2010 17:34:30 -0400
parents
children f923dddf0bf7
comparison
equal deleted inserted replaced
1152:0904dd74894d 1153:ae5ba6206fd3
1 '''
2 =================================================
3 Plugin system for interative algortithm Version B
4 =================================================
5
6 After the meeting (September 16) we sort of stumbled on
7 two possible versions of the plug-in system. This represents
8 the second version. It suffered a few changes after seeing
9 Olivier's code and talking to him.
10
11 Concept
12 =======
13
14 The basic idea behind this version is not to have a list of all
15 possible events, but rather have plugin register to events.By
16 specifying what plugin listens to which event produced by what
17 plugin you define a sort of dependency graph. Structuring things
18 in such a graph might make the script more intuitive when reading.
19
20 I will first go through pseudo-code for two example and then enumerate
21 my insights and concepts on the matter
22
23
24 Example : Producer - Consumer that Guillaume described
25 ======================================================
26
27
28 .. code-block::
29 '''
30 sch = Schedular()
31 p = ProducerFactory()
32 p = sched.schedule_plugin(event = every(p.outputStuffs()), p )
33 p = sched.schedule_plugin(event = Event("begin"), p)
34 c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory )
35 pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory )
36
37 sched.run()
38
39
40
41 '''
42 Example : Logistic regression
43 =============================
44
45 Task description
46 ----------------
47
48 Apply a logistic regression network to some dataset. Use early stopping.
49 Save the weights everytime a new best score is obtained. Print trainnig score
50 after each epoch.
51
52
53 Possible script
54 ---------------
55
56 Sorry for long variable names, I wanted to make it clear what things are ..
57
58 .. code-block::
59 '''
60 sched = Schedular()
61 # This is a shortcut .. I've been to the dataset committee and they have
62 # something else in mind, a bit more fancy; I totally agree with their
63 # ideas I just wrote it like this for brevity;
64 train_data, valid_data, test_data = load_mnist()
65
66 # This part was not actually discussed into details ; I have my own
67 # opinions of how this part should be done .. but for now I decomposed it
68 # in two functions for convinience
69 logreg = generate_logreg_model()
70
71
72
73 # Note that this is not meant to replace the string idea of Olivier. I
74 # actually think that is a cool idea, when writing things down I realized
75 # it might be a bit more intuitive if you would get that object by calling
76 # a method of the instance of the plugin with a significant name
77 # I added a warpping function that sort of tells on which such events
78 # you can have similar to what Olivier wrote { every, at .. }
79 doOneTrainingStepPlugin =ModelPluginFactory( model = logreg )
80 trainDataPlugin = sched.schedule_plugin(
81 event = every(doOneTrainingStepPlugin.new_train_error),
82 DatasetsPluginFactory( data = train_data) )
83
84 trainDataPlugin = sched.schedule_plugin(
85 event = Event('begin'), trainDataPlugin )
86
87 clock = sched.schedule_plugin( event = all_events, ClockFactory())
88
89 doOneTrainingStepPlugin = sched.schedule_plugin(
90 event = every(trainDataPlugin.new_batch()),
91 ModelFactory( model = logreg))
92
93
94
95
96 # Arguably we wouldn't need such a plugin. I added just to show how to
97 # deal with multiple events from same plugin; the plugin is suppose to
98 # reset the index of the dataset to 0, so that you start a new epoch
99 resetDataset = sched.schedule_plugin(
100 event = every(trainDataPlugin.end_of_dataset()),
101 ResetDatasetFactory( data = train_data) )
102
103
104 checkValidationPlugin = sched.schedule_plugin(
105 event =every_nth(doOneTrainingStepPlugin.done(), n=1000),
106 ValidationFactory( model = logreg data = valid_data))
107
108 # You have the options to also do :
109 #
110 # checkValidationPlugin = sched.schedule_plugin(
111 # event =every(trainDataPlugin.end_of_dataset()),
112 # ValidationFactory( model = logreg, data = valid_data))
113 # checkValidationPlugin = sched.schedule_plugin(
114 # event =every(clock.hour()),
115 # ValidationFactory( model = logreg, data = valid_data))
116
117 # This plugin would be responsible to send the Event("terminate") when the
118 # patience expired.
119 earlyStopperPlugin = sched.schedule_plugin(
120 event = every(checkValidationPlugin.new_validation_error()),
121 earlyStopperFactory(initial_patience = 10) )
122
123 # Printing & Saving plugins
124
125 printTrainingError = sched.schedule_plugin(
126 event = every(doOneTrainingStepPlugin.new_train_error()),
127 AggregateAndPrintFactory())
128
129 printTrainingError = sched.schedule_plugin(
130 event = every(trainDataPlugin.end_of_dataset()),
131 printTrainingError)
132 saveWeightsPlugin = sched.schedule_plugin(
133 event = every(earlyStopperPlugin.new_best_valid_error()),
134 saveWeightsFactory( model = logreg) )
135
136 sched.run()
137
138 '''
139 Notes
140 =====
141
142 In my code schedule_plugin returns the plugin that it regsiters. I think that
143 writing something like
144 x = f( .. )
145 y = f(x)
146
147 makes more readable then writing f( .., event_belongs_to = x), or even worse,
148 you only see text, and you would have to go to the plugins to see what events
149 they actually produce.
150
151 At this point I am more concern with how the scripts will look ( the cognitive
152 load to understand them) and how easy is to go to hack into them. From this point
153 of view I would have the following suggestions :
154 * dataset and model creation should create outside the schedular with possibly
155 other mechanisms
156 * there are two types of plugins, those that do not affect the experiment,
157 they just compute statistics and print them, or save different data and those
158 plugin that change the state of the model, like train, or influence the life
159 of the experiment. There should be a minimum of plugins of the second category,
160 to still have the code readable. ( When understanding a script, you only need
161 to understand that part, the rest you assume is just printing stuff).
162 The different categories should also be grouped.
163
164
165 '''