pylearn: doc/v2_planning/plugin

comparison doc/v2_planning/plugin_RP.py @ 1202:7fff3d5c7694

ARCHITECTURE/LAYER: a incomplete story about the plug-ins and way of constructing models

author	pascanur
date	Mon, 20 Sep 2010 20:35:03 -0400
parents	fe6c25eb1e37
children	681b5e7e3b81

comparison

equal deleted inserted replaced

-:46527ae6db53
+:7fff3d5c7694
 '''
-=================================================
-Plugin system for interative algortithm Version B
+!!! Incomplete file .. many of the things I've set up to do are not done
-=================================================
+yet !!!
-After the meeting (September 16) we sort of stumbled on
+============
-two possible versions of the plug-in system. This represents
+Introduction
-the second version. It suffered a few changes after seeing
+============
-Olivier's code and talking to him.
+What this file talks about
+==========================
+* Proposal for the layer committee
+* Proposal of how to deal with plug-ins ( STEP 2)
+* Description of how to glue the two parts
+* Some personal beliefs and argumentation
+The file will point out how :
+* to use the API's other committee proposed or why and how they should
+change
+* it satisfies the listed requirements ( or why it doesn't)
+* this approach might be better then others ( or worse) to the best of
+my knowledge
+Motivation for writing this file
+================================
+I wrote this file because:
+* It will probably answer most of the questions regarding my view,
+minimizing the time wasted on talks
+* When prezenting the entire interface helps see holes in the approach
+* Is here for everybody to read ( easier disimination of information)
+=======
 Concept
 =======
-The basic idea behind this version is not to have a list of all
+I think any experiment that we ( or anybody else ) would want to run with
-possible events, but rather have plugin register to events.By
+our library will be composed of two steps :
-specifying what plugin listens to which event produced by what
-plugin you define a sort of dependency graph. Structuring things
+* Step 1. Constructing (or choosing or initializing) the model, the
-in such a graph might make the script more intuitive when reading.
+datasets, error measures, optimizers and so on ( everything up to the
+iterative loop). I think this step has been covered by different
-I will first go through pseudo-code for two example and then enumerate
+committies but possibly glued together by the layer committee.
-my insights and concepts on the matter
+* Step 2. Compose the iterative loops and perform them ( this is what the
+architecture committee dealt with)
-Example : Producer - Consumer that Guillaume described
-======================================================
+I believe there is a natural way of going from *Step 1* to *Step 2*
+which would be presented as Step 1.5
-.. code-block::
+Step 2
-'''
+======
-sch = Schedular()
+I will start with step 2 ( because I think that is more of a hot subject
-@FnPlugin(sch)
+right now). I will assume you have the write plugins at had.
-def producer(self,event):
+This is a DBN with early stopping and ..
-self.fire('stuff', value = 'some text')
+.. code-block:: python
-@FnPlugin(sch)
+'''
-def consumer(self,event):
+data = load_mnist()
-print event.value
+train_xy valid_xy test_xy  = split(data, split =
+[(0,40000),(40000,50000),[50000,60000]])
-@FnPlugin(sch)
+train_x, train_y = train_xy
-def prod_consumer(self,event):
+valid_x, valid_y = valid_xy
-print event.value
+test_x,  test_y  = test_xy
-self.fire('stuff2', value = 'stuff')
+################# CONSTRUCTING THE MODEL ###################################
-producer.act( on = Event('begin'), when = once() )
+############################################################################
-producer.act( on = Event('stuff'), when = always() )
-consumer.act( on = Event('stuff'), when = always() )
+x0 = pca(train_x)
-prod_consumer.act( on = Event('stuff'), when = always() )
+## Layer 1:
-sch.run()
+h1       = sigmoid(dotW_b(x0,units = 200), constraint = L1( coeff = 0.1))
+x1       = recurrent_layer()
+x1.t0    = x0
+x1.value = binomial_sample(sigmoid( reconstruct( binomial_sample(h1), x0)))
-'''
+cost     = free_energy(train_x) - free_energy(x1.tp(5))
-Example : Logistic regression
+grads    = [ (g.var, T.grad(cost.var, g.var)) for g in cost.params ]
-=============================
+pseudo_cost = sum([ pl.sum(pl.abs(g)) for g in cost.params])
+rbm1     = SGD( cost = pseudo_cost, grads = grads)
-Task description
-----------------
+# Layer 2:
+rbm2,h2    = rbm(h1, units = 200, k = 5, use= 'CD')
-Apply a logistic regression network to some dataset. Use early stopping.
+# Logreg
-Save the weights everytime a new best score is obtained. Print trainnig score
+logreg,out = logreg(h2, units = 10)
-after each epoch.
+train_err  = mean_over(missclassification(argmax(out), train_y))
+valid_err  = train_err.replace({train_x:valid_x, train_y:valid_y})
+test_err   = train_err.replace({train_x: test_x, train_y: test_y})
-Possible script
----------------
+##########################################################################
+############### Constructing the training loop ###########################
-Notes : This would look the same for any other architecture that does not
-imply pre-training ( i.e. deep networks). For example the mlp.
+ca = Schedular()
-.. code-block::
-'''
+### Constructing Modes ###
+pretrain_layer1  = ca.mode('pretrain0')
-sched = Schedular()
+pretrain_layer2  = ca.mode('pretrain1')
+early_stopping   = ca.mode('early')
-# Data / Model Building :
+valid1           = ca.mode('stuff')
-# I skiped over how to design this part
+kfolds           = ca.mode('kfolds')
-# though I have some ideas
-real_train_data, real_valid_data = load_mnist()
+# Construct modes dependency graph
-model = logreg()
+valid0.include([ pretrian_layer1, pretrain_layer2, early_stopper])
+kfolds.include( valid0 )
-# Main Plugins ( already provided in the library );
-# This wrappers also registers the plugin
+pretrain_layer1.act( on = valid1.begin(), when = always())
-train_data = create_data_plugin( sched, data = real_train_data)
+pretrain_layer2.act( on = pretrain_layer1.end(), when = always())
-train_model    = create_train_model(sched, model = model)
+early_stopping.act ( on = pretrain_layer2.end(), when = always())
-validate_model = create_valid_model(sched, model = model, data = valid_data)
-early_stopper  = create_early_stopper(sched)
+# Construct counter plugin that keeps track of number of epochs
+@FnPlugin
-# On the fly plugins ( print random stuff); the main difference from my
+def counter(self, msg):
-# FnPlugin from Olivier's version is that it also register the plugin in sched
+# a bit of a hack.. it will look more classic if you would
-@FnPlugin(sched)
+# start with a class instead
-def print_error(self, event):
+if not hasattr(self, 'val'):
-if event.type == Event('begin'):
+self.val = 0
-self.value = []
-elif event.type == train_model.error():
+if msg = Message('eod'):
-self.value += [event.value]
+self.val += 1
-else event.type == train_data.eod():
+if self.val < 10:
-print 'Error :', numpy.mean(self.value)
+self.fire(Message('continue'))
+else:
-@FnPlugin(sched)
+self.fire(Message('terminate'))
-def save_model(self, event):
-if event.type == early_stopper.new_best_error():
-cPickle.dump(model.parameters(), open('best_params.pkl','wb'))
+# Construct pre-training plugins
+rbm1_plugin = plugin_wrapper(rbm1, sched = pretrain_layer1)
+rbm1_plugin.listen(Message('init'), update_hyperparameters)
-# Create the dependency graph describing what does what
+rbm2_plugin = plugin_wrapper(rbm2, sched = pretrain_layer2)
-train_data.act( on = sched.begin(), when = once() )
+rbm2_plugin.listen(Message('init'), update_hyperparameters)
-train_data.act( on = Event('batch'),
+rbm1_counter = pretrain_layer1.register(counter)
-train_data.act( on = train_model.done(), when = always())
+rbm2_counter = pretrain_layer2.register(counter)
-train_model.act(on = train_data.batch(), when = always())
-validate_model.act(on = train_model.done(), when = every(n=10000))
-early_stopper.act(on = validate_model.error(), when = always())
+# Dependency graph for pre-training layer 0
-print_error.act( on = train_model.error(), when = always() )
+rbm1_plugin.act( on = [ pretrain_layer1.begin()
-print_error.act( on = train_data.eod(), when = always() )
+Message('continue')     ],
-save_model.act( on = eraly_stopper.new_best_errot(), when = always() )
+when = always())
+rbm1_counter.act( on = rbm1_plugin.eod(), when = always() )
-# Run the entire thing
+# Dependency graph for pre-training layer 1
+rbm2_plugin.act( on = pretrain_layer2.begin(), when = always())
+pretrain_layer2.stop( on = rbm2_plugin.eod(), when = always())
+# Constructing fine-tunning plugins
+learner = early_stopper.register(plugin_wrapper(logreg))
+learner.listen(Message('init'), update_hyperparameters)
+validation = early_stopper.register( plugin_wrapper(valid_err)))
+validation.listen(Message('init'), update_hyperparameters)
+clock = early_stopper.register( ca.generate_clock())
+early_stopper_plugin = early_stopper.register( early_stopper_plugin)
+@FnPlugin
+def save_weights(self, message):
+cPickle.dump(logreg, open('model.pkl'))
+learner.act( on = early_stopper.begin(), when = always())
+learner.act( on = learner.value(), when = always())
+validation.act( on = clock.hour(), when = every(n = 1))
+early_stopper.act( on = validation.value(), when = always())
+save_model.act( on = early_stopper.new_best_error(), when =always())
+@FnPlugin
+def kfolds_plugin(self,event):
+if not hasattr(self, 'n'):
+self.n = -1
+self.splits = [ [ (    0,40000),(40000,50000),(50000,60000) ],
+[ (10000,50000),(50000,60000),(    0,10000) ],
+[ (20000,60000),(    0,10000),(10000,20000) ] ]
+if self.n < 3:
+self.n += 1
+msg = Message('new split')
+msg.data = (data.get_hyperparam('split'),self.splits[self.n])
+self.fire(msg)
+else:
+self.fire(Message('terminate'))
+kfolds.register(kfolds_plugin)
+kfolds_plugin.act(kfolds.begin(), when = always())
+kfolds_plugin.act(valid0.end(), always() )
+valid0.act(Message('new split'), always() )
+sched.include(kfolds)
 sched.run()
+'''
-'''
-Notes
+Notes:
-=====
+when a mode is regstered to begin with a certain message, it will
+rebroadcast that message when it starts, with only switching the
-* I think we should have a FnPlugin decorator ( exactly like Olivier's) just
+type from whatever it was to 'init'. It will also send all 'init' messages
-that also attaches the new created plugin to the schedule. This way you
+of the mode in which is included ( or of the schedular).
-can create plugin on the fly ( as long as they are simple functions that
-print stuff, or compute simple statitics ).
+one might be able to shorten this by having Macros that creates modes
-* I added a method act to a Plugin. You use that to create the dependency
+and automatically register certain plugins to it; you can always
-graph ( it could also be named listen to be more plugin like interface)
+afterwards add plugins to any mode
-* Plugins are obtained in 3 ways  :
-- by wrapping a dataset / model or something similar
-- by a function that constructs it from nothing
-- by decorating a function
+Step 1
-In all cases I would suggest then when creating them you should provide
+======
-the schedular as well, and the constructor also registers the plugin
-* The plugin concept works well as long as the plugins are a bit towards
+You start with the dataset that you construct as the dataset committee
-heavy duty computation, disregarding printing plugins and such. If you have
+proposed to. You continue constructing your model by applying
-many small plugins this system might only introduce an overhead. I would
+transformation, more or less like you would in Theano. When constructing
-argue that using theano is restricted to each plugin. Therefore I would
+your model you also get a graph "behind the scene". Note though that
-strongly suggest that the architecture to be done outside the schedular
+this graph is totally different then the one Theano would create!
-with a different approach.
+Let start with an example:
-* I would suggest that the framework to be used only for the training loop
+.. code-block:: python
-(after you get the adapt function, compute error function) so is more about
-the meta-learner, hyper-learner learner level.
+'''
+data_x, data_y = GPU_transform(load_mnist())
-* A general remark that I guess everyone will agree on. We should make
+output         = sigmoid(dotW_b(data_x,10))
-sure that implementing a new plugin is as easy/simple as possible. We
+err            = cross_entropy(output, data_y)
-have to hide all the complexity in the schedular ( it is the part of the
+learner        = SGD(err)
-code we will not need or we would rarely need to work on).
+'''
-* I have not went into how to implement the different components, but
+This shows how to create the learner behind the logistic regression,
-following Olivier's code I think that part would be more or less straight
+but not the function that will compute the validation error or the test
-forward.
+error ( or any other statistics). Before going into the detail of what
+all those transforms ( or the results after applying one) means, here
-'''
+is another partial example for a SdA :
+.. code-block:: python
-'''
+'''
+## Layer 1:
+data_x,data_y = GPU_transform(load_mnist())
+noisy_data_x  = gaussian_noise(data_x, amount = 0.1)
+hidden1       = tanh(dotW_b(data_x, n_units = 200))
+reconstruct1  = reconstruct(hidden1.replace(data_x, noisy_data_x),
+noisy_data_x)
+err1          = cross_entropy(reconstruct1, data_x)
+learner1      = SGD(err1)
+# Layer 2 :
+noisy_hidden1 = gaussian_noise(hidden1, amount = 0.1)
+hidden2       = tanh(dotW_b(hidden1, n_units = 200))
+reconstruct2  = reconstruct(hidden2.replace(hidden1,noisy_hidden1),
+noisy_hidden1)
+err2          = cross_entropy(reconstruct2, hidden)
+learner2      = SGD(err2)
+# Top layer:
+output  = sigmoid(dotW_b(hidden2, n_units = 10))
+err     = cross_entropy(output, data_y)
+learner = SGD(err)
+'''
+What's going on here?
+---------------------
+By calling different "transforms" (we could call them ops or functions)
+you decide what the architecture does. What you get back from applying
+any of these transforms, are nodes. You have different types of nodes
+(which I will enumerate a bit later) but they all offer a basic interface.
+That interface is the dataset API + a few more methods and/or attributes.
+There are also a few transform that work on the graph that I think will
+be pretty useful :
+* .replace(dict) -> method; replaces the subgraphs given as keys with
+the ones given as values; throws an exception if it
+is impossible
+* reconstruct(dict) -> transform; tries to reconstruct the nodes given as
+keys starting from the nodes given as values by
+going through the inverse of all transforms that
+are in between
+* .tm, .tp    -> methods; returns nodes that correspond to the value
+at t-k or t+k
+* recurrent_layer -> function; creates a special type of node that is
+recurrent; the node has two important attributes that
+need to be specified before calling the node iterator;
+those attributes are .t0 which represents the initial
+value and .value which should describe the recurrent
+relation
+* add_constraints -> transform; adds a constraint to a given node
+* data_listener -> function; creates a special node that listens for
+messages to get data; it should be used to decompose
+the architecture in modules that can run on different
+machines
+* switch(hyperparam, dict) -> transform; a lazy switch that allows you
+do construct by hyper-parameters
+* get_hyperparameter -> method; given a name it will return the first node
+starting from top that is a hyper parameter and has
+that name
+* get_parameter  -> method; given a name it will return the first node
+starting from top that is a parameter and has that
+name
+Because every node provides the dataset API it means you can iterate over
+any of the nodes. They will produce the original dataset transformed up
+to that point.
+** NOTES **
+1. This is not like a symbolic graph. When adding a transform
+you can get a warning straight forward. This is because you start from
+the dataset and you always have access to some data. Though sometime
+you would want to have the nodes lazy, i.e. not try to compute everything
+until the graph is done.
+2. You can still have complex Theano expressions. Each node has a
+theano variable describing the graph up to that point + optionally
+a compiled function over which you can iterate. We can use some
+on_demand mechanism to compile when needed.
+What types of nodes do you have
+--------------------------------
+Note that this differentiation is more or less semantical and not
+mandatory syntactical. Is just to help understanding the graph.
+* Data Nodes         -- datasets are such nodes; the result of any
+simple transform is also a data node ( like the result
+of a sigmoid, or dotW_b)
+* Learner Nodes      --  they are the same as data nodes, with the
+difference that they have side effects on the model;
+they update the weights
+* Apply Nodes        -- they are used to connect input variables to
+the transformation/op node and output nodes
+* Dependency Nodes   -- very similar to apply nodes just that they connect
+constraints subgraphs to a model graph
+* Parameter Nodes    -- when iterating over them they will only output
+the values of the parameters;
+* Hyper-parameter Nodes -- very similar to parameter nodes; this is a
+semantical difference ( they are not updated by the
+any learner nodes)
+* Transform Nodes       -- this nodes describe the mathematical function
+and if there is one the inverse of that transform; there
+would usually be two types of transforms; ones that use
+theano and those that do not -- this is because those that
+do can be composed
+Each node is lazy, in the sense that unless you try to iterate on it, it
+will not try to compute the next value.
+Isn't this too low level ?
+--------------------------
+I think that way of writing and decomposing your neural network is
+efficient and useful when writing such networks. Of course when you
+just want to run a classical SdA you shouldn't need to go through the
+trouble of writing all that. I think we should have Macors for this.
+* Macro -- syntactically it looks just like a transform (i.e. a python
+function) only that it actually applies multiple transforms to the input
+and might return several nodes (not just one).
+Example:
+learner, prediction, pretraining-learners = SdA(
+input   = data_x,
+target  = data_y,
+hiddens = [200,200],
+noises  = [0.1,0.1])
+How do you deal with loops ?
+----------------------------
+When implementing architectures you some time need to loop like for
+RNN or CD, PCD etc. Adding loops in such a scheme is always hard.
+I borrowed the idea in the code below from PyBrain. You first construct
+a shell layer that you call recurrent layer. Then you define the
+functionality by giving the initial value and the recurrent step.
+For example:
+.. code-block:: python
+'''
+# sketch of writing a RNN
+x = load_mnist()
+y = recurrent_layer()
+y.value = tanh(dotW(x, n=50) + dotW(y.tm(1),50))
+y.t0 = zeros( (50,))
+out = dotW(y,10)
+# sketch of writing CDk starting from x
+x       = recurrent_layer()
+x.t0    = input_values
+h       = binomial_sample( sigmoid( dotW_b(x.tm(1))))
+x.value = binomial_sample( sigmoid( reconstruct(h, x.tm(1))))
+## the assumption is that the inverse of sigmoid is the identity fn
+pseudo_cost = free_energy(x.tp(k)) - free_energy(x.t0)
+'''
+How do I deal with constraints ?
+--------------------------------
+Use the add constrain. You are required to pass a transform with its
+hyper-parameters initial values ?
+How do I deal with other type of networs ?
+------------------------------------------
+(opaque transforms)
+new_data = PCA(data_x)
+svn_predictions = SVN(data_x)
+svn_learner  = SVN_learner(svn_predictions)
+# Note that for the SVN this might be just syntactic sugar; we have the two
+# steps because we expect different interfaces for this nodes
+Step 1.5
+========
+There is a wrapper function called plugin. Once you call plugin over
+any of the previous nodes you will get a plugin that has a certain
+set of conventions
+''''

Mercurial > pylearn

comparison doc/v2_planning/plugin_RP.py @ 1202:7fff3d5c7694