pylearn: pylearn/algorithms/mcRBM.py comparison

comparison pylearn/algorithms/mcRBM.py @ 984:5badf36a6daf

mcRBM - added notes to leading comment

author	James Bergstra <bergstrj@iro.umontreal.ca>
date	Tue, 24 Aug 2010 13:50:26 -0400
parents	2a53384d9742
children	78b5bdf967f6

comparison

equal deleted inserted replaced

-:15371ff780a0
+:5badf36a6daf
 Ranzato, M. and Hinton, G. E. (2010)
 Modeling pixel means and covariances using factored third-order Boltzmann machines.
 IEEE Conference on Computer Vision and Pattern Recognition.
-and performs one of the experiments on CIFAR-10 discussed in that paper.
+and performs one of the experiments on CIFAR-10 discussed in that paper.  There are some minor
+discrepancies between the paper and the accompanying code (train_mcRBM.py), and the
+accompanying code has been taken to be correct in those cases because I couldn't get things to
+work otherwise.
 Math
 ====
 E =  \sum_f h_f ( \sum_i C_{if} v_i )^2
-Full Energy of mean and Covariance RBM, with
+Version in paper
+----------------
+Full Energy of the Mean and Covariance RBM, with
 :math:`h_k = h_k^{(c)}`,
 :math:`g_j = h_j^{(m)}`,
 :math:`b_k = b_k^{(c)}`,
 :math:`c_j = b_j^{(m)}`,
 :math:`U_{if} = C_{if}`,
-:
 E (v, h, g) =
-- 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2
+- 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2
 - \sum_k b_k h_k
 + 0.5 \sum_i v_i^2
 - \sum_j \sum_i W_{ij} g_j v_i
 - \sum_j c_j g_j
-For the energy function to correspond to a probability distribution, P must be non-positive.
+For the energy function to correspond to a probability distribution, P must be non-positive.  P
+is initialized to be a diagonal, and in our experience it can be left as such because even in
+the paper it has a very low learning rate, and is only allowed to be updated after the filters
+in U are learned (in effect).
+Version in published train_mcRBM code
+-------------------------------------
+The train_mcRBM file implements learning in a similar but technically different Energy function:
+E (v, h, g) =
+- 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
+- \sum_k b_k h_k
++ 0.5 \sum_i v_i^2
+- \sum_j \sum_i W_{ij} g_j v_i
+- \sum_j c_j g_j
+There are two differences with respect to the paper:
+- 'v' is not normalized by its length, but rather it is normalized to have length close to
+the square root of the number of its components.  The variable called 'small' that
+"avoids division by zero" is orders larger than machine precision, and is on the order of
+the normalized sum-of-squares, so I've included it in the Energy function.
+- 'U' is also not normalized by its length.  U is initialized to have columns that are
+shorter than unit-length (approximately 0.2 with the 105 principle components in the
+train_mcRBM data).  During training, the columns of U are constrained manually to have
+equal lengths (see the use of normVF), but Euclidean norm is allowed to change.  During
+learning it quickly converges towards 1 and then exceeds 1.  It does not seem like this
+column-wise normalization of U is justified by maximum-likelihood, I have no intuition
+for why it is used.
+Version in this code
+--------------------
+This file implements the same algorithm as the train_mcRBM code, except that the P matrix is
+omitted for clarity, and replaced analytically with a negative identity matrix.
+E (v, h, g) =
++ 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
+- \sum_k b_k h_k
++ 0.5 \sum_i v_i^2
+- \sum_j \sum_i W_{ij} g_j v_i
+- \sum_j c_j g_j
 Conventions in this file
 ========================
 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little
 - `U`, a matrix whose rows are visible covariance directions (I x F)
 - `W`, a matrix whose rows are visible mean directions (I x J)
 - `b`, a vector of hidden covariance biases (K)
 - `c`, a vector of hidden mean biases  (J)
-Matrices are generally layed out according to a C-order convention.
+Matrices are generally layed out and accessed according to a C-order convention.
 """
+#
+# WORKING NOTES
+# THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION
+# NOT THE ENERGY FUNCTION IN THE CODE!!!
+#
 # Free energy is the marginal energy of visible units
 # Recall:
 #   Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x)
 #
 #

Mercurial > pylearn

comparison pylearn/algorithms/mcRBM.py @ 984:5badf36a6daf