comparison doc/v2_planning/plugin_RP.py @ 1202:7fff3d5c7694

ARCHITECTURE/LAYER: a incomplete story about the plug-ins and way of constructing models
author pascanur
date Mon, 20 Sep 2010 20:35:03 -0400
parents fe6c25eb1e37
children 681b5e7e3b81
comparison
equal deleted inserted replaced
1201:46527ae6db53 1202:7fff3d5c7694
1 ''' 1 '''
2 ================================================= 2
3 Plugin system for interative algortithm Version B 3 !!! Incomplete file .. many of the things I've set up to do are not done
4 ================================================= 4 yet !!!
5 5
6 After the meeting (September 16) we sort of stumbled on 6 ============
7 two possible versions of the plug-in system. This represents 7 Introduction
8 the second version. It suffered a few changes after seeing 8 ============
9 Olivier's code and talking to him. 9
10 10 What this file talks about
11 ==========================
12 * Proposal for the layer committee
13 * Proposal of how to deal with plug-ins ( STEP 2)
14 * Description of how to glue the two parts
15 * Some personal beliefs and argumentation
16
17 The file will point out how :
18 * to use the API's other committee proposed or why and how they should
19 change
20 * it satisfies the listed requirements ( or why it doesn't)
21 * this approach might be better then others ( or worse) to the best of
22 my knowledge
23
24
25 Motivation for writing this file
26 ================================
27
28 I wrote this file because:
29 * It will probably answer most of the questions regarding my view,
30 minimizing the time wasted on talks
31 * When prezenting the entire interface helps see holes in the approach
32 * Is here for everybody to read ( easier disimination of information)
33
34
35 =======
11 Concept 36 Concept
12 ======= 37 =======
13 38
14 The basic idea behind this version is not to have a list of all 39 I think any experiment that we ( or anybody else ) would want to run with
15 possible events, but rather have plugin register to events.By 40 our library will be composed of two steps :
16 specifying what plugin listens to which event produced by what 41
17 plugin you define a sort of dependency graph. Structuring things 42 * Step 1. Constructing (or choosing or initializing) the model, the
18 in such a graph might make the script more intuitive when reading. 43 datasets, error measures, optimizers and so on ( everything up to the
19 44 iterative loop). I think this step has been covered by different
20 I will first go through pseudo-code for two example and then enumerate 45 committies but possibly glued together by the layer committee.
21 my insights and concepts on the matter 46
22 47 * Step 2. Compose the iterative loops and perform them ( this is what the
23 48 architecture committee dealt with)
24 Example : Producer - Consumer that Guillaume described 49
25 ====================================================== 50 I believe there is a natural way of going from *Step 1* to *Step 2*
26 51 which would be presented as Step 1.5
27 52
28 .. code-block:: 53 Step 2
29 ''' 54 ======
30 sch = Schedular() 55
31 56 I will start with step 2 ( because I think that is more of a hot subject
32 @FnPlugin(sch) 57 right now). I will assume you have the write plugins at had.
33 def producer(self,event): 58 This is a DBN with early stopping and ..
34 self.fire('stuff', value = 'some text') 59
35 60 .. code-block:: python
36 @FnPlugin(sch) 61 '''
37 def consumer(self,event): 62 data = load_mnist()
38 print event.value 63 train_xy valid_xy test_xy = split(data, split =
39 64 [(0,40000),(40000,50000),[50000,60000]])
40 @FnPlugin(sch) 65 train_x, train_y = train_xy
41 def prod_consumer(self,event): 66 valid_x, valid_y = valid_xy
42 print event.value 67 test_x, test_y = test_xy
43 self.fire('stuff2', value = 'stuff') 68
44 69 ################# CONSTRUCTING THE MODEL ###################################
45 producer.act( on = Event('begin'), when = once() ) 70 ############################################################################
46 producer.act( on = Event('stuff'), when = always() ) 71
47 consumer.act( on = Event('stuff'), when = always() ) 72 x0 = pca(train_x)
48 prod_consumer.act( on = Event('stuff'), when = always() ) 73
49 74 ## Layer 1:
50 sch.run() 75 h1 = sigmoid(dotW_b(x0,units = 200), constraint = L1( coeff = 0.1))
51 76 x1 = recurrent_layer()
52 77 x1.t0 = x0
53 78 x1.value = binomial_sample(sigmoid( reconstruct( binomial_sample(h1), x0)))
54 ''' 79 cost = free_energy(train_x) - free_energy(x1.tp(5))
55 Example : Logistic regression 80 grads = [ (g.var, T.grad(cost.var, g.var)) for g in cost.params ]
56 ============================= 81 pseudo_cost = sum([ pl.sum(pl.abs(g)) for g in cost.params])
57 82 rbm1 = SGD( cost = pseudo_cost, grads = grads)
58 Task description 83
59 ---------------- 84 # Layer 2:
60 85 rbm2,h2 = rbm(h1, units = 200, k = 5, use= 'CD')
61 Apply a logistic regression network to some dataset. Use early stopping. 86 # Logreg
62 Save the weights everytime a new best score is obtained. Print trainnig score 87 logreg,out = logreg(h2, units = 10)
63 after each epoch. 88 train_err = mean_over(missclassification(argmax(out), train_y))
64 89 valid_err = train_err.replace({train_x:valid_x, train_y:valid_y})
65 90 test_err = train_err.replace({train_x: test_x, train_y: test_y})
66 Possible script 91
67 --------------- 92 ##########################################################################
68 93 ############### Constructing the training loop ###########################
69 Notes : This would look the same for any other architecture that does not 94
70 imply pre-training ( i.e. deep networks). For example the mlp. 95 ca = Schedular()
71 96
72 .. code-block:: 97
73 ''' 98 ### Constructing Modes ###
74 99 pretrain_layer1 = ca.mode('pretrain0')
75 sched = Schedular() 100 pretrain_layer2 = ca.mode('pretrain1')
76 101 early_stopping = ca.mode('early')
77 # Data / Model Building : 102 valid1 = ca.mode('stuff')
78 # I skiped over how to design this part 103 kfolds = ca.mode('kfolds')
79 # though I have some ideas 104
80 real_train_data, real_valid_data = load_mnist() 105 # Construct modes dependency graph
81 model = logreg() 106 valid0.include([ pretrian_layer1, pretrain_layer2, early_stopper])
82 107 kfolds.include( valid0 )
83 # Main Plugins ( already provided in the library ); 108
84 # This wrappers also registers the plugin 109 pretrain_layer1.act( on = valid1.begin(), when = always())
85 train_data = create_data_plugin( sched, data = real_train_data) 110 pretrain_layer2.act( on = pretrain_layer1.end(), when = always())
86 train_model = create_train_model(sched, model = model) 111 early_stopping.act ( on = pretrain_layer2.end(), when = always())
87 validate_model = create_valid_model(sched, model = model, data = valid_data) 112
88 early_stopper = create_early_stopper(sched) 113
89 114 # Construct counter plugin that keeps track of number of epochs
90 115 @FnPlugin
91 # On the fly plugins ( print random stuff); the main difference from my 116 def counter(self, msg):
92 # FnPlugin from Olivier's version is that it also register the plugin in sched 117 # a bit of a hack.. it will look more classic if you would
93 @FnPlugin(sched) 118 # start with a class instead
94 def print_error(self, event): 119 if not hasattr(self, 'val'):
95 if event.type == Event('begin'): 120 self.val = 0
96 self.value = [] 121
97 elif event.type == train_model.error(): 122 if msg = Message('eod'):
98 self.value += [event.value] 123 self.val += 1
99 else event.type == train_data.eod(): 124 if self.val < 10:
100 print 'Error :', numpy.mean(self.value) 125 self.fire(Message('continue'))
101 126 else:
102 @FnPlugin(sched) 127 self.fire(Message('terminate'))
103 def save_model(self, event): 128
104 if event.type == early_stopper.new_best_error(): 129
105 cPickle.dump(model.parameters(), open('best_params.pkl','wb')) 130 # Construct pre-training plugins
106 131 rbm1_plugin = plugin_wrapper(rbm1, sched = pretrain_layer1)
107 132 rbm1_plugin.listen(Message('init'), update_hyperparameters)
108 # Create the dependency graph describing what does what 133 rbm2_plugin = plugin_wrapper(rbm2, sched = pretrain_layer2)
109 train_data.act( on = sched.begin(), when = once() ) 134 rbm2_plugin.listen(Message('init'), update_hyperparameters)
110 train_data.act( on = Event('batch'), 135 rbm1_counter = pretrain_layer1.register(counter)
111 train_data.act( on = train_model.done(), when = always()) 136 rbm2_counter = pretrain_layer2.register(counter)
112 train_model.act(on = train_data.batch(), when = always()) 137
113 validate_model.act(on = train_model.done(), when = every(n=10000)) 138
114 early_stopper.act(on = validate_model.error(), when = always()) 139 # Dependency graph for pre-training layer 0
115 print_error.act( on = train_model.error(), when = always() ) 140 rbm1_plugin.act( on = [ pretrain_layer1.begin()
116 print_error.act( on = train_data.eod(), when = always() ) 141 Message('continue') ],
117 save_model.act( on = eraly_stopper.new_best_errot(), when = always() ) 142 when = always())
118 143 rbm1_counter.act( on = rbm1_plugin.eod(), when = always() )
119 # Run the entire thing 144
145
146 # Dependency graph for pre-training layer 1
147 rbm2_plugin.act( on = pretrain_layer2.begin(), when = always())
148 pretrain_layer2.stop( on = rbm2_plugin.eod(), when = always())
149
150
151 # Constructing fine-tunning plugins
152 learner = early_stopper.register(plugin_wrapper(logreg))
153 learner.listen(Message('init'), update_hyperparameters)
154 validation = early_stopper.register( plugin_wrapper(valid_err)))
155 validation.listen(Message('init'), update_hyperparameters)
156 clock = early_stopper.register( ca.generate_clock())
157 early_stopper_plugin = early_stopper.register( early_stopper_plugin)
158
159 @FnPlugin
160 def save_weights(self, message):
161 cPickle.dump(logreg, open('model.pkl'))
162
163
164 learner.act( on = early_stopper.begin(), when = always())
165 learner.act( on = learner.value(), when = always())
166 validation.act( on = clock.hour(), when = every(n = 1))
167 early_stopper.act( on = validation.value(), when = always())
168 save_model.act( on = early_stopper.new_best_error(), when =always())
169
170 @FnPlugin
171 def kfolds_plugin(self,event):
172 if not hasattr(self, 'n'):
173 self.n = -1
174 self.splits = [ [ ( 0,40000),(40000,50000),(50000,60000) ],
175 [ (10000,50000),(50000,60000),( 0,10000) ],
176 [ (20000,60000),( 0,10000),(10000,20000) ] ]
177 if self.n < 3:
178 self.n += 1
179 msg = Message('new split')
180 msg.data = (data.get_hyperparam('split'),self.splits[self.n])
181 self.fire(msg)
182 else:
183 self.fire(Message('terminate'))
184
185
186 kfolds.register(kfolds_plugin)
187 kfolds_plugin.act(kfolds.begin(), when = always())
188 kfolds_plugin.act(valid0.end(), always() )
189 valid0.act(Message('new split'), always() )
190
191 sched.include(kfolds)
192
120 sched.run() 193 sched.run()
121 194
122 195 '''
123 ''' 196
124 Notes 197 Notes:
125 ===== 198 when a mode is regstered to begin with a certain message, it will
126 199 rebroadcast that message when it starts, with only switching the
127 * I think we should have a FnPlugin decorator ( exactly like Olivier's) just 200 type from whatever it was to 'init'. It will also send all 'init' messages
128 that also attaches the new created plugin to the schedule. This way you 201 of the mode in which is included ( or of the schedular).
129 can create plugin on the fly ( as long as they are simple functions that 202
130 print stuff, or compute simple statitics ). 203 one might be able to shorten this by having Macros that creates modes
131 * I added a method act to a Plugin. You use that to create the dependency 204 and automatically register certain plugins to it; you can always
132 graph ( it could also be named listen to be more plugin like interface) 205 afterwards add plugins to any mode
133 * Plugins are obtained in 3 ways : 206
134 - by wrapping a dataset / model or something similar 207
135 - by a function that constructs it from nothing 208
136 - by decorating a function 209 Step 1
137 In all cases I would suggest then when creating them you should provide 210 ======
138 the schedular as well, and the constructor also registers the plugin 211
139 212
140 * The plugin concept works well as long as the plugins are a bit towards 213 You start with the dataset that you construct as the dataset committee
141 heavy duty computation, disregarding printing plugins and such. If you have 214 proposed to. You continue constructing your model by applying
142 many small plugins this system might only introduce an overhead. I would 215 transformation, more or less like you would in Theano. When constructing
143 argue that using theano is restricted to each plugin. Therefore I would 216 your model you also get a graph "behind the scene". Note though that
144 strongly suggest that the architecture to be done outside the schedular 217 this graph is totally different then the one Theano would create!
145 with a different approach. 218 Let start with an example:
146 219
147 * I would suggest that the framework to be used only for the training loop 220 .. code-block:: python
148 (after you get the adapt function, compute error function) so is more about 221
149 the meta-learner, hyper-learner learner level. 222 '''
150 223 data_x, data_y = GPU_transform(load_mnist())
151 * A general remark that I guess everyone will agree on. We should make 224 output = sigmoid(dotW_b(data_x,10))
152 sure that implementing a new plugin is as easy/simple as possible. We 225 err = cross_entropy(output, data_y)
153 have to hide all the complexity in the schedular ( it is the part of the 226 learner = SGD(err)
154 code we will not need or we would rarely need to work on). 227 '''
155 228
156 * I have not went into how to implement the different components, but 229 This shows how to create the learner behind the logistic regression,
157 following Olivier's code I think that part would be more or less straight 230 but not the function that will compute the validation error or the test
158 forward. 231 error ( or any other statistics). Before going into the detail of what
159 232 all those transforms ( or the results after applying one) means, here
160 ''' 233 is another partial example for a SdA :
161 234
162 235 .. code-block:: python
163 ''' 236
237 '''
238 ## Layer 1:
239
240 data_x,data_y = GPU_transform(load_mnist())
241 noisy_data_x = gaussian_noise(data_x, amount = 0.1)
242 hidden1 = tanh(dotW_b(data_x, n_units = 200))
243 reconstruct1 = reconstruct(hidden1.replace(data_x, noisy_data_x),
244 noisy_data_x)
245 err1 = cross_entropy(reconstruct1, data_x)
246 learner1 = SGD(err1)
247
248 # Layer 2 :
249 noisy_hidden1 = gaussian_noise(hidden1, amount = 0.1)
250 hidden2 = tanh(dotW_b(hidden1, n_units = 200))
251 reconstruct2 = reconstruct(hidden2.replace(hidden1,noisy_hidden1),
252 noisy_hidden1)
253 err2 = cross_entropy(reconstruct2, hidden)
254 learner2 = SGD(err2)
255
256 # Top layer:
257
258 output = sigmoid(dotW_b(hidden2, n_units = 10))
259 err = cross_entropy(output, data_y)
260 learner = SGD(err)
261
262 '''
263
264 What's going on here?
265 ---------------------
266
267 By calling different "transforms" (we could call them ops or functions)
268 you decide what the architecture does. What you get back from applying
269 any of these transforms, are nodes. You have different types of nodes
270 (which I will enumerate a bit later) but they all offer a basic interface.
271 That interface is the dataset API + a few more methods and/or attributes.
272 There are also a few transform that work on the graph that I think will
273 be pretty useful :
274
275 * .replace(dict) -> method; replaces the subgraphs given as keys with
276 the ones given as values; throws an exception if it
277 is impossible
278
279 * reconstruct(dict) -> transform; tries to reconstruct the nodes given as
280 keys starting from the nodes given as values by
281 going through the inverse of all transforms that
282 are in between
283
284 * .tm, .tp -> methods; returns nodes that correspond to the value
285 at t-k or t+k
286 * recurrent_layer -> function; creates a special type of node that is
287 recurrent; the node has two important attributes that
288 need to be specified before calling the node iterator;
289 those attributes are .t0 which represents the initial
290 value and .value which should describe the recurrent
291 relation
292 * add_constraints -> transform; adds a constraint to a given node
293 * data_listener -> function; creates a special node that listens for
294 messages to get data; it should be used to decompose
295 the architecture in modules that can run on different
296 machines
297
298 * switch(hyperparam, dict) -> transform; a lazy switch that allows you
299 do construct by hyper-parameters
300
301 * get_hyperparameter -> method; given a name it will return the first node
302 starting from top that is a hyper parameter and has
303 that name
304 * get_parameter -> method; given a name it will return the first node
305 starting from top that is a parameter and has that
306 name
307
308
309
310 Because every node provides the dataset API it means you can iterate over
311 any of the nodes. They will produce the original dataset transformed up
312 to that point.
313
314 ** NOTES **
315 1. This is not like a symbolic graph. When adding a transform
316 you can get a warning straight forward. This is because you start from
317 the dataset and you always have access to some data. Though sometime
318 you would want to have the nodes lazy, i.e. not try to compute everything
319 until the graph is done.
320
321 2. You can still have complex Theano expressions. Each node has a
322 theano variable describing the graph up to that point + optionally
323 a compiled function over which you can iterate. We can use some
324 on_demand mechanism to compile when needed.
325
326 What types of nodes do you have
327 --------------------------------
328
329 Note that this differentiation is more or less semantical and not
330 mandatory syntactical. Is just to help understanding the graph.
331
332
333 * Data Nodes -- datasets are such nodes; the result of any
334 simple transform is also a data node ( like the result
335 of a sigmoid, or dotW_b)
336 * Learner Nodes -- they are the same as data nodes, with the
337 difference that they have side effects on the model;
338 they update the weights
339 * Apply Nodes -- they are used to connect input variables to
340 the transformation/op node and output nodes
341 * Dependency Nodes -- very similar to apply nodes just that they connect
342 constraints subgraphs to a model graph
343 * Parameter Nodes -- when iterating over them they will only output
344 the values of the parameters;
345 * Hyper-parameter Nodes -- very similar to parameter nodes; this is a
346 semantical difference ( they are not updated by the
347 any learner nodes)
348 * Transform Nodes -- this nodes describe the mathematical function
349 and if there is one the inverse of that transform; there
350 would usually be two types of transforms; ones that use
351 theano and those that do not -- this is because those that
352 do can be composed
353
354 Each node is lazy, in the sense that unless you try to iterate on it, it
355 will not try to compute the next value.
356
357
358 Isn't this too low level ?
359 --------------------------
360
361 I think that way of writing and decomposing your neural network is
362 efficient and useful when writing such networks. Of course when you
363 just want to run a classical SdA you shouldn't need to go through the
364 trouble of writing all that. I think we should have Macors for this.
365
366 * Macro -- syntactically it looks just like a transform (i.e. a python
367 function) only that it actually applies multiple transforms to the input
368 and might return several nodes (not just one).
369 Example:
370
371
372 learner, prediction, pretraining-learners = SdA(
373 input = data_x,
374 target = data_y,
375 hiddens = [200,200],
376 noises = [0.1,0.1])
377
378
379 How do you deal with loops ?
380 ----------------------------
381
382 When implementing architectures you some time need to loop like for
383 RNN or CD, PCD etc. Adding loops in such a scheme is always hard.
384 I borrowed the idea in the code below from PyBrain. You first construct
385 a shell layer that you call recurrent layer. Then you define the
386 functionality by giving the initial value and the recurrent step.
387 For example:
388
389 .. code-block:: python
390
391 '''
392 # sketch of writing a RNN
393 x = load_mnist()
394 y = recurrent_layer()
395 y.value = tanh(dotW(x, n=50) + dotW(y.tm(1),50))
396 y.t0 = zeros( (50,))
397 out = dotW(y,10)
398
399
400 # sketch of writing CDk starting from x
401 x = recurrent_layer()
402 x.t0 = input_values
403 h = binomial_sample( sigmoid( dotW_b(x.tm(1))))
404 x.value = binomial_sample( sigmoid( reconstruct(h, x.tm(1))))
405 ## the assumption is that the inverse of sigmoid is the identity fn
406 pseudo_cost = free_energy(x.tp(k)) - free_energy(x.t0)
407
408
409 '''
410
411 How do I deal with constraints ?
412 --------------------------------
413
414 Use the add constrain. You are required to pass a transform with its
415 hyper-parameters initial values ?
416
417
418 How do I deal with other type of networs ?
419 ------------------------------------------
420
421 (opaque transforms)
422
423 new_data = PCA(data_x)
424
425
426 svn_predictions = SVN(data_x)
427 svn_learner = SVN_learner(svn_predictions)
428 # Note that for the SVN this might be just syntactic sugar; we have the two
429 # steps because we expect different interfaces for this nodes
430
431
432
433 Step 1.5
434 ========
435
436 There is a wrapper function called plugin. Once you call plugin over
437 any of the previous nodes you will get a plugin that has a certain
438 set of conventions
439
440 ''''