Mercurial > pylearn

--- a/doc/v2_planning/architecture.txt	Fri Sep 17 20:55:18 2010 -0400
+++ b/doc/v2_planning/architecture.txt	Fri Sep 17 21:14:41 2010 -0400
@@ -76,6 +76,9 @@
 clarification of what the h*** am I talking about) in the following example:

    * Linear version:
+
+.. code-block:: python
+
     my_experiment = pipeline([
         data,
         filter_samples,
@@ -86,6 +89,9 @@
     ])

    * Encapsulated version:
+
+.. code-block:: python
+
     my_experiment = evaluation(
         data=PCA(filter_samples(data)),
         split=k_fold_split,
--- a/doc/v2_planning/dataset.txt	Fri Sep 17 20:55:18 2010 -0400
+++ b/doc/v2_planning/dataset.txt	Fri Sep 17 21:14:41 2010 -0400
@@ -4,8 +4,8 @@
 Some talking points from the September 2 meeting:

  * Datasets as views/tasks (Pascal Vincent's idea): our dataset specification
- needs to be flexible enough to accommodate different (sub)tasks and views of
- the same underlying data.
+   needs to be flexible enough to accommodate different (sub)tasks and views of
+   the same underlying data.
  * Datasets as probability distributions from which one can sample.
     * That's not something I would consider to be a dataset-related problem to
         tackle now: a probability distribution in Pylearn would probably be a
@@ -13,7 +13,7 @@
         DatasetToDistribution class for instance, that would take care of viewing a
         dataset as a probability distribution. -- OD
  * Our specification should allow transparent handling of infinite datasets (or
- simply datasets which cannot fit in memory)
+   simply datasets which cannot fit in memory)
  * GPU/buffering issues.

 Commiteee: DE, OB, OD, AB, PV
@@ -117,7 +117,9 @@
 dataset that we use, and the class declaration contains essentially everything
 there is to know about the dataset):

-class MNIST(Dataset):
+.. code-block:: python
+
+  class MNIST(Dataset):
     def  __init__(self,inputs=['train_x.npy'],outputs=['train_y.npy']):
         self.type='standard_xy'
         self.in_memory = True
@@ -259,8 +261,12 @@
 the writer of a new dataset subclass). Anyway, maybe a first thing we could
 think about is what we want a mini-batch to be. I think we can agree that we
 would like to be able to do something like:
+
+.. code-block:: python
+
     for mb in dataset.mini_batches(size=10):
         learner.update(mb.input, mb.target)
+
 so that it should be ok for a mini-batch to be an object whose fields
 (that should have the same name as those of the dataset) are numpy arrays.
 More generally, we would like to be able to iterate on samples in a
@@ -285,6 +291,7 @@
 OD: (this is hopefully a clearer re-write of the original version from
 r7e6e77d50eeb, which I was not happy with).
 There are typically three kinds of objects that spit out data:
+
 1. Datasets that are loaded from disk or are able to generate data all by
    themselves (i.e. without any other dataset as input)
 2. Datasets that transform their input dataset in a way that only depends on
@@ -293,6 +300,7 @@
    potentially different dataset (e.g. PCA when you want to learn the projection
    space on the training set in order to transform both the training and test
    sets).
+
 My impression currently is that we would use dataset subclasses to handle 1
 and 2. However, 3 requires a learner framework, so you would need to have
 something like a LearnerOutputDataset(trained_learner, dataset).
@@ -304,6 +312,7 @@

 The main advantages I find in this approach (that I have been using at
 Ubisoft) are:
+
 - You only need to learn how to subclass the learner class. The only dataset
   class is LearnerOutputDataset, which you could just name Dataset.
 - You do not have different ways to achieve the same result (having to figure
--- a/doc/v2_planning/plugin_architecture_GD.txt	Fri Sep 17 20:55:18 2010 -0400
+++ b/doc/v2_planning/plugin_architecture_GD.txt	Fri Sep 17 21:14:41 2010 -0400
@@ -3,11 +3,13 @@

 The "central authority" (CA) is the glue which takes care of interfacing plugins
 with one another. It has 3 basic roles:
+
 * it maintains a list of "registered" or "active" plugins
 * it receives and queues the various messages sent by the plugins
 * dispatches the messages to the recipient, based on various "events"

 Events can take different forms:
+
 * the CA can trigger various events based on running time
 * can be linked to messages emitted by the various plugins. Events can be
   triggered based on the frequency of such messages.
@@ -26,13 +28,15 @@
 James and OB to python-ize this :)


-class MessageX(Message):
+.. code-block:: python
+
+  class MessageX(Message):
     """
     A message is basically a data container. This could very well be replaced by
     a generic Python object.
     """

-class Plugin(object):
+  class Plugin(object):
     """
     The base plugin object doesn't do much. It contains a reference to the CA
     (upon plugin being registered with the CA), provides boilerplate code
@@ -92,7 +96,7 @@
                 callback(message)


-class ProducerPlugin(Plugin):
+  class ProducerPlugin(Plugin):

     def dostuff():
         """
@@ -108,7 +112,7 @@
             ca.send(msga)  # ask CA to forward to other plugins


-class ConsumerPlugin(Plugin):
+  class ConsumerPlugin(Plugin):

     @handler(MessageA)
     def func(msga):
@@ -119,7 +123,7 @@
         # do something with message A


-class ConsumerProducerPlugin(Plugin):
+  class ConsumerProducerPlugin(Plugin):

     @handler(MessageA)
     def func(msga):
@@ -138,7 +142,7 @@


-class CentralAuthority(object):
+  class CentralAuthority(object):

     active_plugins = []  # contains a list of registered plugins

@@ -211,7 +215,9 @@
 =======================


-def main():
+.. code-block:: python
+
+  def main():

     ca = CentralAuthority()