diff doc/v2_planning/plugin.txt @ 1118:8cc324f388ba

proposal for a plugin system
author Olivier Breuleux <breuleuo@iro.umontreal.ca>
date Tue, 14 Sep 2010 16:01:32 -0400
parents
children 81ea57c6716d
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/v2_planning/plugin.txt	Tue Sep 14 16:01:32 2010 -0400
@@ -0,0 +1,68 @@
+
+======================================
+Plugin system for iterative algorithms
+======================================
+
+I would like to propose a plugin system for iterative algorithms in
+Pylearn. Basically, it would be useful to be able to sandwich
+arbitrary behavior in-between two training iterations of an algorithm
+(whenever applicable). I believe many mechanisms are best implemented
+this way: early stopping, saving checkpoints, tracking statistics,
+real time visualization, remote control of the process, or even
+interlacing the training of several models and making them interact
+with each other.
+
+So here is the proposal: essentially, a plugin would be a (schedule,
+timeline, function) tuple.
+
+Schedule
+========
+
+The schedule is some function that takes two "times", t1 and t2, and
+returns True if the plugin should be run in-between these times. The
+reason why we check a time range [t1, t2] rather than some discrete
+time t is that we do not necessarily want to schedule plugins on
+iteration numbers. For instance, we could want to run a plugin every
+second, or every minute, and then [t1, t2] would be the start time and
+end time of the last iteration - and then we run the plugin whenever a
+new second started in that range (but still on training iteration
+boundaries). Alternatively, we could want to run a plugin every n
+examples seen - but if we use mini-batches, the nth example might be
+square in the middle of a batch.
+
+I've implemented a somewhat elaborate schedule system. `each(10)`
+produces a schedule that returns true whenever a multiple of 10 is in
+the time range. `at(17, 153)` produces one that returns true when 17
+or 143 is in the time range. Schedules can be combined and negated,
+e.g. `each(10) & ~at(20, 30)` (execute at each 10, except at 20 and
+30). So that gives a lot of flexibility as to when you want to do
+things.
+
+Timeline
+========
+
+This would be a string indicating on what "timeline" the schedule is
+supposed to operate. For instance, there could be a "real time"
+timeline, an "algorithm time" timeline, an "iterations" timeline, a
+"number of examples" timeline, and so on. This means you can schedule
+some action to be executed every actual second, or every second of
+training time (ignoring time spent executing plugins), or every
+discrete iteration, or every n examples processed. This might be a
+bloat feature (it was an afterthought to my original design, anyway),
+but I think that there are circumstances where each of these options
+is the best one.
+
+Function
+========
+
+The plugin function would receive some object containing the time
+range, a flag indicating whether the training has started, a flag
+indicating whether the training is done (which they can set in order
+to stop training), as well as anything pertinent about the model.
+
+Implementation
+==============
+
+I have implemented the feature in plugin.py, in this directory. Simply
+run python plugin.py to test it.
+