Mercurial > pylearn
view doc/v2_planning/plugin.txt @ 1120:27d0ef195e1d
v2planning - added comment to dataset re: visualization
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Tue, 14 Sep 2010 18:43:42 -0400 |
parents | 81ea57c6716d |
children | a1957faecc9b |
line wrap: on
line source
====================================== Plugin system for iterative algorithms ====================================== I would like to propose a plugin system for iterative algorithms in Pylearn. Basically, it would be useful to be able to sandwich arbitrary behavior in-between two training iterations of an algorithm (whenever applicable). I believe many mechanisms are best implemented this way: early stopping, saving checkpoints, tracking statistics, real time visualization, remote control of the process, or even interlacing the training of several models and making them interact with each other. So here is the proposal: essentially, a plugin would be a (schedule, timeline, function) tuple. Schedule ======== The schedule is some function that takes two "times", t1 and t2, and returns True if the plugin should be run in-between these times. The indices refer to a "timeline" unit described below (e.g. "real time" or "iterations"). The reason why we check a time range [t1, t2] rather than some discrete time t is that we do not necessarily want to schedule plugins on iteration numbers. For instance, we could want to run a plugin every second, or every minute, and then [t1, t2] would be the start time and end time of the last iteration - and then we run the plugin whenever a new second started in that range (but still on training iteration boundaries). Alternatively, we could want to run a plugin every n examples seen - but if we use mini-batches, the nth example might be square in the middle of a batch. I've implemented a somewhat elaborate schedule system. `each(10)` produces a schedule that returns true whenever a multiple of 10 is in the time range. `at(17, 153)` produces one that returns true when 17 or 143 is in the time range. Schedules can be combined and negated, e.g. `each(10) & ~at(20, 30)` (execute at each 10, except at 20 and 30). So that gives a lot of flexibility as to when you want to do things. Timeline ======== This would be a string indicating on what "timeline" the schedule is supposed to operate. For instance, there could be a "real time" timeline, an "algorithm time" timeline, an "iterations" timeline, a "number of examples" timeline, and so on. This means you can schedule some action to be executed every actual second, or every second of training time (ignoring time spent executing plugins), or every discrete iteration, or every n examples processed. This might be a bloat feature (it was an afterthought to my original design, anyway), but I think that there are circumstances where each of these options is the best one. Function ======== The plugin function would receive some object containing the time range, a flag indicating whether the training has started, a flag indicating whether the training is done (which they can set in order to stop training), as well as anything pertinent about the model. Implementation ============== I have implemented the feature in plugin.py, in this directory. Simply run python plugin.py to test it.