comparison doc/v2_planning/plugin.txt @ 1118:8cc324f388ba

proposal for a plugin system
author Olivier Breuleux <breuleuo@iro.umontreal.ca>
date Tue, 14 Sep 2010 16:01:32 -0400
parents
children 81ea57c6716d
comparison
equal deleted inserted replaced
1117:c1943feada10 1118:8cc324f388ba
1
2 ======================================
3 Plugin system for iterative algorithms
4 ======================================
5
6 I would like to propose a plugin system for iterative algorithms in
7 Pylearn. Basically, it would be useful to be able to sandwich
8 arbitrary behavior in-between two training iterations of an algorithm
9 (whenever applicable). I believe many mechanisms are best implemented
10 this way: early stopping, saving checkpoints, tracking statistics,
11 real time visualization, remote control of the process, or even
12 interlacing the training of several models and making them interact
13 with each other.
14
15 So here is the proposal: essentially, a plugin would be a (schedule,
16 timeline, function) tuple.
17
18 Schedule
19 ========
20
21 The schedule is some function that takes two "times", t1 and t2, and
22 returns True if the plugin should be run in-between these times. The
23 reason why we check a time range [t1, t2] rather than some discrete
24 time t is that we do not necessarily want to schedule plugins on
25 iteration numbers. For instance, we could want to run a plugin every
26 second, or every minute, and then [t1, t2] would be the start time and
27 end time of the last iteration - and then we run the plugin whenever a
28 new second started in that range (but still on training iteration
29 boundaries). Alternatively, we could want to run a plugin every n
30 examples seen - but if we use mini-batches, the nth example might be
31 square in the middle of a batch.
32
33 I've implemented a somewhat elaborate schedule system. `each(10)`
34 produces a schedule that returns true whenever a multiple of 10 is in
35 the time range. `at(17, 153)` produces one that returns true when 17
36 or 143 is in the time range. Schedules can be combined and negated,
37 e.g. `each(10) & ~at(20, 30)` (execute at each 10, except at 20 and
38 30). So that gives a lot of flexibility as to when you want to do
39 things.
40
41 Timeline
42 ========
43
44 This would be a string indicating on what "timeline" the schedule is
45 supposed to operate. For instance, there could be a "real time"
46 timeline, an "algorithm time" timeline, an "iterations" timeline, a
47 "number of examples" timeline, and so on. This means you can schedule
48 some action to be executed every actual second, or every second of
49 training time (ignoring time spent executing plugins), or every
50 discrete iteration, or every n examples processed. This might be a
51 bloat feature (it was an afterthought to my original design, anyway),
52 but I think that there are circumstances where each of these options
53 is the best one.
54
55 Function
56 ========
57
58 The plugin function would receive some object containing the time
59 range, a flag indicating whether the training has started, a flag
60 indicating whether the training is done (which they can set in order
61 to stop training), as well as anything pertinent about the model.
62
63 Implementation
64 ==============
65
66 I have implemented the feature in plugin.py, in this directory. Simply
67 run python plugin.py to test it.
68