Mercurial > pylearn
comparison doc/v2_planning/plugin_RP.py @ 1153:ae5ba6206fd3
a first draft of pseudo-code for logreg .. using version B (?) approach
author | Razvan Pascanu <r.pascanu@gmail.com> |
---|---|
date | Thu, 16 Sep 2010 17:34:30 -0400 |
parents | |
children | f923dddf0bf7 |
comparison
equal
deleted
inserted
replaced
1152:0904dd74894d | 1153:ae5ba6206fd3 |
---|---|
1 ''' | |
2 ================================================= | |
3 Plugin system for interative algortithm Version B | |
4 ================================================= | |
5 | |
6 After the meeting (September 16) we sort of stumbled on | |
7 two possible versions of the plug-in system. This represents | |
8 the second version. It suffered a few changes after seeing | |
9 Olivier's code and talking to him. | |
10 | |
11 Concept | |
12 ======= | |
13 | |
14 The basic idea behind this version is not to have a list of all | |
15 possible events, but rather have plugin register to events.By | |
16 specifying what plugin listens to which event produced by what | |
17 plugin you define a sort of dependency graph. Structuring things | |
18 in such a graph might make the script more intuitive when reading. | |
19 | |
20 I will first go through pseudo-code for two example and then enumerate | |
21 my insights and concepts on the matter | |
22 | |
23 | |
24 Example : Producer - Consumer that Guillaume described | |
25 ====================================================== | |
26 | |
27 | |
28 .. code-block:: | |
29 ''' | |
30 sch = Schedular() | |
31 p = ProducerFactory() | |
32 p = sched.schedule_plugin(event = every(p.outputStuffs()), p ) | |
33 p = sched.schedule_plugin(event = Event("begin"), p) | |
34 c = sched.schedule_plugin(event = every(p.outputStuffs()), ConsumerFactory ) | |
35 pc= sched.schedule_plugin(event = every(p.outputStuffs()), ProducerConsumerFactory ) | |
36 | |
37 sched.run() | |
38 | |
39 | |
40 | |
41 ''' | |
42 Example : Logistic regression | |
43 ============================= | |
44 | |
45 Task description | |
46 ---------------- | |
47 | |
48 Apply a logistic regression network to some dataset. Use early stopping. | |
49 Save the weights everytime a new best score is obtained. Print trainnig score | |
50 after each epoch. | |
51 | |
52 | |
53 Possible script | |
54 --------------- | |
55 | |
56 Sorry for long variable names, I wanted to make it clear what things are .. | |
57 | |
58 .. code-block:: | |
59 ''' | |
60 sched = Schedular() | |
61 # This is a shortcut .. I've been to the dataset committee and they have | |
62 # something else in mind, a bit more fancy; I totally agree with their | |
63 # ideas I just wrote it like this for brevity; | |
64 train_data, valid_data, test_data = load_mnist() | |
65 | |
66 # This part was not actually discussed into details ; I have my own | |
67 # opinions of how this part should be done .. but for now I decomposed it | |
68 # in two functions for convinience | |
69 logreg = generate_logreg_model() | |
70 | |
71 | |
72 | |
73 # Note that this is not meant to replace the string idea of Olivier. I | |
74 # actually think that is a cool idea, when writing things down I realized | |
75 # it might be a bit more intuitive if you would get that object by calling | |
76 # a method of the instance of the plugin with a significant name | |
77 # I added a warpping function that sort of tells on which such events | |
78 # you can have similar to what Olivier wrote { every, at .. } | |
79 doOneTrainingStepPlugin =ModelPluginFactory( model = logreg ) | |
80 trainDataPlugin = sched.schedule_plugin( | |
81 event = every(doOneTrainingStepPlugin.new_train_error), | |
82 DatasetsPluginFactory( data = train_data) ) | |
83 | |
84 trainDataPlugin = sched.schedule_plugin( | |
85 event = Event('begin'), trainDataPlugin ) | |
86 | |
87 clock = sched.schedule_plugin( event = all_events, ClockFactory()) | |
88 | |
89 doOneTrainingStepPlugin = sched.schedule_plugin( | |
90 event = every(trainDataPlugin.new_batch()), | |
91 ModelFactory( model = logreg)) | |
92 | |
93 | |
94 | |
95 | |
96 # Arguably we wouldn't need such a plugin. I added just to show how to | |
97 # deal with multiple events from same plugin; the plugin is suppose to | |
98 # reset the index of the dataset to 0, so that you start a new epoch | |
99 resetDataset = sched.schedule_plugin( | |
100 event = every(trainDataPlugin.end_of_dataset()), | |
101 ResetDatasetFactory( data = train_data) ) | |
102 | |
103 | |
104 checkValidationPlugin = sched.schedule_plugin( | |
105 event =every_nth(doOneTrainingStepPlugin.done(), n=1000), | |
106 ValidationFactory( model = logreg data = valid_data)) | |
107 | |
108 # You have the options to also do : | |
109 # | |
110 # checkValidationPlugin = sched.schedule_plugin( | |
111 # event =every(trainDataPlugin.end_of_dataset()), | |
112 # ValidationFactory( model = logreg, data = valid_data)) | |
113 # checkValidationPlugin = sched.schedule_plugin( | |
114 # event =every(clock.hour()), | |
115 # ValidationFactory( model = logreg, data = valid_data)) | |
116 | |
117 # This plugin would be responsible to send the Event("terminate") when the | |
118 # patience expired. | |
119 earlyStopperPlugin = sched.schedule_plugin( | |
120 event = every(checkValidationPlugin.new_validation_error()), | |
121 earlyStopperFactory(initial_patience = 10) ) | |
122 | |
123 # Printing & Saving plugins | |
124 | |
125 printTrainingError = sched.schedule_plugin( | |
126 event = every(doOneTrainingStepPlugin.new_train_error()), | |
127 AggregateAndPrintFactory()) | |
128 | |
129 printTrainingError = sched.schedule_plugin( | |
130 event = every(trainDataPlugin.end_of_dataset()), | |
131 printTrainingError) | |
132 saveWeightsPlugin = sched.schedule_plugin( | |
133 event = every(earlyStopperPlugin.new_best_valid_error()), | |
134 saveWeightsFactory( model = logreg) ) | |
135 | |
136 sched.run() | |
137 | |
138 ''' | |
139 Notes | |
140 ===== | |
141 | |
142 In my code schedule_plugin returns the plugin that it regsiters. I think that | |
143 writing something like | |
144 x = f( .. ) | |
145 y = f(x) | |
146 | |
147 makes more readable then writing f( .., event_belongs_to = x), or even worse, | |
148 you only see text, and you would have to go to the plugins to see what events | |
149 they actually produce. | |
150 | |
151 At this point I am more concern with how the scripts will look ( the cognitive | |
152 load to understand them) and how easy is to go to hack into them. From this point | |
153 of view I would have the following suggestions : | |
154 * dataset and model creation should create outside the schedular with possibly | |
155 other mechanisms | |
156 * there are two types of plugins, those that do not affect the experiment, | |
157 they just compute statistics and print them, or save different data and those | |
158 plugin that change the state of the model, like train, or influence the life | |
159 of the experiment. There should be a minimum of plugins of the second category, | |
160 to still have the code readable. ( When understanding a script, you only need | |
161 to understand that part, the rest you assume is just printing stuff). | |
162 The different categories should also be grouped. | |
163 | |
164 | |
165 ''' |