Mercurial > pylearn
diff doc/v2_planning/plugin.txt @ 1118:8cc324f388ba
proposal for a plugin system
author | Olivier Breuleux <breuleuo@iro.umontreal.ca> |
---|---|
date | Tue, 14 Sep 2010 16:01:32 -0400 |
parents | |
children | 81ea57c6716d |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/v2_planning/plugin.txt Tue Sep 14 16:01:32 2010 -0400 @@ -0,0 +1,68 @@ + +====================================== +Plugin system for iterative algorithms +====================================== + +I would like to propose a plugin system for iterative algorithms in +Pylearn. Basically, it would be useful to be able to sandwich +arbitrary behavior in-between two training iterations of an algorithm +(whenever applicable). I believe many mechanisms are best implemented +this way: early stopping, saving checkpoints, tracking statistics, +real time visualization, remote control of the process, or even +interlacing the training of several models and making them interact +with each other. + +So here is the proposal: essentially, a plugin would be a (schedule, +timeline, function) tuple. + +Schedule +======== + +The schedule is some function that takes two "times", t1 and t2, and +returns True if the plugin should be run in-between these times. The +reason why we check a time range [t1, t2] rather than some discrete +time t is that we do not necessarily want to schedule plugins on +iteration numbers. For instance, we could want to run a plugin every +second, or every minute, and then [t1, t2] would be the start time and +end time of the last iteration - and then we run the plugin whenever a +new second started in that range (but still on training iteration +boundaries). Alternatively, we could want to run a plugin every n +examples seen - but if we use mini-batches, the nth example might be +square in the middle of a batch. + +I've implemented a somewhat elaborate schedule system. `each(10)` +produces a schedule that returns true whenever a multiple of 10 is in +the time range. `at(17, 153)` produces one that returns true when 17 +or 143 is in the time range. Schedules can be combined and negated, +e.g. `each(10) & ~at(20, 30)` (execute at each 10, except at 20 and +30). So that gives a lot of flexibility as to when you want to do +things. + +Timeline +======== + +This would be a string indicating on what "timeline" the schedule is +supposed to operate. For instance, there could be a "real time" +timeline, an "algorithm time" timeline, an "iterations" timeline, a +"number of examples" timeline, and so on. This means you can schedule +some action to be executed every actual second, or every second of +training time (ignoring time spent executing plugins), or every +discrete iteration, or every n examples processed. This might be a +bloat feature (it was an afterthought to my original design, anyway), +but I think that there are circumstances where each of these options +is the best one. + +Function +======== + +The plugin function would receive some object containing the time +range, a flag indicating whether the training has started, a flag +indicating whether the training is done (which they can set in order +to stop training), as well as anything pertinent about the model. + +Implementation +============== + +I have implemented the feature in plugin.py, in this directory. Simply +run python plugin.py to test it. +