Mercurial > pylearn
comparison doc/v2_planning/plugin.txt @ 1118:8cc324f388ba
proposal for a plugin system
author | Olivier Breuleux <breuleuo@iro.umontreal.ca> |
---|---|
date | Tue, 14 Sep 2010 16:01:32 -0400 |
parents | |
children | 81ea57c6716d |
comparison
equal
deleted
inserted
replaced
1117:c1943feada10 | 1118:8cc324f388ba |
---|---|
1 | |
2 ====================================== | |
3 Plugin system for iterative algorithms | |
4 ====================================== | |
5 | |
6 I would like to propose a plugin system for iterative algorithms in | |
7 Pylearn. Basically, it would be useful to be able to sandwich | |
8 arbitrary behavior in-between two training iterations of an algorithm | |
9 (whenever applicable). I believe many mechanisms are best implemented | |
10 this way: early stopping, saving checkpoints, tracking statistics, | |
11 real time visualization, remote control of the process, or even | |
12 interlacing the training of several models and making them interact | |
13 with each other. | |
14 | |
15 So here is the proposal: essentially, a plugin would be a (schedule, | |
16 timeline, function) tuple. | |
17 | |
18 Schedule | |
19 ======== | |
20 | |
21 The schedule is some function that takes two "times", t1 and t2, and | |
22 returns True if the plugin should be run in-between these times. The | |
23 reason why we check a time range [t1, t2] rather than some discrete | |
24 time t is that we do not necessarily want to schedule plugins on | |
25 iteration numbers. For instance, we could want to run a plugin every | |
26 second, or every minute, and then [t1, t2] would be the start time and | |
27 end time of the last iteration - and then we run the plugin whenever a | |
28 new second started in that range (but still on training iteration | |
29 boundaries). Alternatively, we could want to run a plugin every n | |
30 examples seen - but if we use mini-batches, the nth example might be | |
31 square in the middle of a batch. | |
32 | |
33 I've implemented a somewhat elaborate schedule system. `each(10)` | |
34 produces a schedule that returns true whenever a multiple of 10 is in | |
35 the time range. `at(17, 153)` produces one that returns true when 17 | |
36 or 143 is in the time range. Schedules can be combined and negated, | |
37 e.g. `each(10) & ~at(20, 30)` (execute at each 10, except at 20 and | |
38 30). So that gives a lot of flexibility as to when you want to do | |
39 things. | |
40 | |
41 Timeline | |
42 ======== | |
43 | |
44 This would be a string indicating on what "timeline" the schedule is | |
45 supposed to operate. For instance, there could be a "real time" | |
46 timeline, an "algorithm time" timeline, an "iterations" timeline, a | |
47 "number of examples" timeline, and so on. This means you can schedule | |
48 some action to be executed every actual second, or every second of | |
49 training time (ignoring time spent executing plugins), or every | |
50 discrete iteration, or every n examples processed. This might be a | |
51 bloat feature (it was an afterthought to my original design, anyway), | |
52 but I think that there are circumstances where each of these options | |
53 is the best one. | |
54 | |
55 Function | |
56 ======== | |
57 | |
58 The plugin function would receive some object containing the time | |
59 range, a flag indicating whether the training has started, a flag | |
60 indicating whether the training is done (which they can set in order | |
61 to stop training), as well as anything pertinent about the model. | |
62 | |
63 Implementation | |
64 ============== | |
65 | |
66 I have implemented the feature in plugin.py, in this directory. Simply | |
67 run python plugin.py to test it. | |
68 |