# HG changeset patch # User James Bergstra # Date 1284760415 14400 # Node ID 7d34edde029d975fc60ca5e29d9c2c2c48f757f4 # Parent f111f8c2a280ef00937950b8c6d81fdf2ce49ed2 added serializability requiremnt diff -r f111f8c2a280 -r 7d34edde029d doc/v2_planning/requirements.txt --- a/doc/v2_planning/requirements.txt Fri Sep 17 17:07:52 2010 -0400 +++ b/doc/v2_planning/requirements.txt Fri Sep 17 17:53:35 2010 -0400 @@ -86,3 +86,27 @@ not be obvious anymore how to remove the k-fold split in the saved model you want to use in production. + +Requirements for component architecture +======================================= + + +R14. Serializability of experiments. (essentially in pursuit of R6) + +Jobs that are running a learning algorithm with our components (datasets, +models, algorithms) must be able to serialize the experiment's state to a string +(typically written to disk) and be able to restart it from such a string. There +must be a mechanism to tell a job to serialize the experiment as soon as +possible, and a latency of up to 10 seconds should be acceptable. It must also +be possible to deserialize the experiment for introspection (inspect the state +of individual components), not just for continuing the experiment. The +experiment can assume that resources on disk that were present when the +experiment started will be present when the experiment resumes. The experiment +cannot assume that resources written by the experiment will still be there (e.g. +in /tmp or cwd). Implementations should make an effort to make the serialized +representation compact, when it is possible to recompute or reload from disk +at deserialization time. + +This requirement is aimed at enabling process migration and job control as well +as post-hoc analysis of experiment results. +