pylearn: pylearn/algorithms/mcRBM.py comparison

comparison pylearn/algorithms/mcRBM.py @ 1500:517f4c02dde9

Auto white space fix.

author	Frederic Bastien <nouiz@nouiz.org>
date	Fri, 09 Sep 2011 10:50:32 -0400
parents	f82b80c841b2
children	2a6a6f16416c

comparison

equal deleted inserted replaced

-:f82b80c841b2
+:517f4c02dde9
 """
 This file implements the Mean & Covariance RBM discussed in
 Ranzato, M. and Hinton, G. E. (2010)
 Modeling pixel means and covariances using factored third-order Boltzmann machines.
 IEEE Conference on Computer Vision and Pattern Recognition.
 Version in paper
 ----------------
 Full Energy of the Mean and Covariance RBM, with
 :math:`h_k = h_k^{(c)}`,
 :math:`g_j = h_j^{(m)}`,
 :math:`b_k = b_k^{(c)}`,
 :math:`c_j = b_j^{(m)}`,
 :math:`U_{if} = C_{if}`,
 E (v, h, g) =
 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2
 - \sum_k b_k h_k
 + 0.5 \sum_i v_i^2
 - \sum_j \sum_i W_{ij} g_j v_i
 - \sum_j c_j g_j
 -------------------------------------
 The train_mcRBM file implements learning in a similar but technically different Energy function:
 E (v, h, g) =
 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
 - \sum_k b_k h_k
 + 0.5 \sum_i v_i^2
 - \sum_j \sum_i W_{ij} g_j v_i
 - \sum_j c_j g_j
 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is
 omitted for clarity, and replaced analytically with a negative identity matrix.
 E (v, h, g) =
 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
 - \sum_k b_k h_k
 + 0.5 \sum_i v_i^2
 - \sum_j \sum_i W_{ij} g_j v_i
 - \sum_j c_j g_j
 E (v, h, g) =
 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
 - \sum_k b_k h_k
 + 0.5 \sum_i v_i^2
 - \sum_j \sum_i W_{ij} g_j v_i
 - \sum_j c_j g_j
 Conventions in this file
 ========================
 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little
 more convenient.
 Global functions like `free_energy` work on an mcRBM as parametrized in a particular way.
 Suppose we have
 - I input dimensions,
 - F squared filters,
 - J mean variables, and
 - K covariance variables.
 The mcRBM is parametrized by 6 variables:
 # WORKING NOTES
 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION
 # NOT THE ENERGY FUNCTION IN THE CODE!!!
 #
 # Free energy is the marginal energy of visible units
 # Recall:
 #   Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x)
 #
 #
 #   E (v, h, g) =
 #       - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2
 #  = -\log( \sum_{h,g} exp(-
 #       - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|)
 #       - \sum_k b_k h_k
 #       + 0.5 \sum_i v_i^2
 #       - \sum_j \sum_i W_{ij} g_j v_i
 #       - \sum_j c_j g_j
 #       - \sum_i a_i v_i ))
 #
 # Get rid of double negs  in exp
 #  = -\log(  \sum_{h} exp(
 #       + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|)
 #       + \sum_k b_k h_k
 #       - 0.5 \sum_i v_i^2
 #       ) * \sum_{g} exp(
 #       + \sum_j \sum_i W_{ij} g_j v_i
 #       + \sum_j c_j g_j))
 #    - \sum_i a_i v_i
 #
 # Break up log
 #  = -\log(  \sum_{h} exp(
 #       + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|)
 #       + \sum_k b_k h_k
 #       ))
 #    -\log( \sum_{g} exp(
 #       + \sum_j \sum_i W_{ij} g_j v_i
 #       + \sum_j c_j g_j )))
 #    + 0.5 \sum_i v_i^2
 #    - \sum_i a_i v_i
 #
 # Use domain h is binary to turn log(sum(exp(sum...))) into sum(log(..
 #  = -\log(\sum_{h} exp(
 #       + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|* |v|)
 #       + \sum_k b_k h_k
 #       ))
 #    - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
 #    + 0.5 \sum_i v_i^2
 #    - \sum_i a_i v_i
 #
 #  = - \sum_{k} \log(1 + exp(b_k + 0.5 \sum_f P_{fk}( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|)))
 #    - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
 #    + 0.5 \sum_i v_i^2
 #    - \sum_i a_i v_i
 #
 # For negative-one-diagonal P this gives:
 #
 #  = - \sum_{k} \log(1 + exp(b_k - 0.5 \sum_i (U_{ik} v_i )^2 / (|U_{*k}|*|v|)))
 #    - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
 #    + 0.5 \sum_i v_i^2
 #    - \sum_i a_i v_i
 import sys, os, logging
 import numpy as np
 import numpy
 def expected_h_g_given_v(self, v):
 """Returns tuple (`h`, `g`) of theano expression conditional expectations in an mcRBM.
 `h` is the conditional on the covariance units.
 `g` is the conditional on the mean units.
 """
 h = TT.nnet.sigmoid(self.hidden_cov_units_preactivation_given_v(v))
 g = TT.nnet.sigmoid(self.c + dot(v,self.W))
 return (h, g)
 def n_visible_units(self):
 """Return the number of visible units of this RBM
 For an RBM made from shared variables, this will return an integer,
 for a purely symbolic RBM this will return a theano expression.
 """
 try:
 return self.W.get_value(borrow=True, return_internal_type=True).shape[0]
 except AttributeError:
 return self.W.shape[0]
 def n_hidden_cov_units(self):
 """Return the number of hidden units for the covariance in this RBM
 For an RBM made from shared variables, this will return an integer,
 for a purely symbolic RBM this will return a theano expression.
 """
 try:
 return self.U.get_value(borrow=True, return_internal_type=True).shape[1]
 except AttributeError:
 return self.U.shape[1]
 def n_hidden_mean_units(self):
 """Return the number of hidden units for the mean in this RBM
 For an RBM made from shared variables, this will return an integer,
 for a purely symbolic RBM this will return a theano expression.
 """
 try:
 return self.W.get_value(borrow=True, return_internal_type=True).shape[1]
 except AttributeError:
 return self.W.shape[1]
 def params(self):
 """Return the elements of [U,W,a,b,c] that are shared variables
 WRITEME : a *prescriptive* definition of this method suitable for mention in the API
 doc.
 """
 return list(self._params)
 @classmethod
 def alloc(cls, n_I, n_K, n_J, rng = 8923402190,
 :param n_I: input dimensionality
 :param n_K: number of covariance hidden units
 :param n_J: number of mean filters (linear)
 :param rng: seed or numpy RandomState object to initialize parameters
 :note:
 Constants for initial ranges and values taken from train_mcRBM.py.
 """
 if not hasattr(rng, 'randn'):
 rng = np.random.RandomState(rng)
 def n_hidden_cov_units(self):
 """Return the number of hidden units for the covariance in this RBM
 For an RBM made from shared variables, this will return an integer,
 for a purely symbolic RBM this will return a theano expression.
 """
 try:
 return self.P.get_value(borrow=True, return_internal_type=True).shape[1]
 except AttributeError:
 return self.P.shape[1]
 :param n_I: input dimensionality
 :param n_K: number of covariance hidden units
 :param n_J: number of mean filters (linear)
 :param rng: seed or numpy RandomState object to initialize parameters
 :note:
 Constants for initial ranges and values taken from train_mcRBM.py.
 """
 return cls.alloc_with_P(
 -numpy.eye((n_K, n_K)).astype(theano.config.floatX),
 norm_doctoring=norm_doctoring)
 rval._params = [rval.U, rval.W, rval.a, rval.b, rval.c, rval.P]
 return rval
 class mcRBMTrainer(object):
 """Light-weight class encapsulating math for mcRBM training
 Attributes:
 - rbm  - an mcRBM instance
 - sampler - an HMC_sampler instance
 - normVF - geometrically updated norm of U matrix columns (shared var)
 def normalize_U(self, new_U):
 """
 :param new_U: a proposed new value for rbm.U
 :returns: a pair of TensorType variables:
 a corrected new value for U, and a new value for self.normVF
 This is a weird normalization procedure, but the sample code for the paper has it, and
 it seems to be important.
 """
 """Return the contrastive divergence gradients on the parameters of self.rbm """
 if neg_v is None:
 neg_v = self.sampler.positions
 return contrastive_grad(
 free_energy_fn=self.rbm.free_energy_given_v,
 pos_v=self.visible_batch,
 neg_v=neg_v,
 wrt = self.rbm.params(),
 other_cost=(l1(self.rbm.U)+l1(self.rbm.W)) * self.effective_l1_penalty)
 def cd_updates(self):
 # TODO: when sgd has an annealing schedule, this should
 #       go through that mechanism.
 lr = TT.clip(
 self.learn_rate * TT.cast(self.lr_anneal_start / (self.iter+1), floatX),
 0.0, #min
 self.learn_rate) #max
 ups.update(dict(sgd_updates(
 self.rbm.params(),
 new_P = ups[self.rbm.P] * self.p_mask
 no_pos_P = TT.switch(new_P<0, new_P, 0)
 ups[self.rbm.P] = - no_pos_P / no_pos_P.sum(axis=0) #normalize to that columns sum 1
 return ups

Mercurial > pylearn

comparison pylearn/algorithms/mcRBM.py @ 1500:517f4c02dde9