# HG changeset patch # User James Bergstra # Date 1282672226 14400 # Node ID 5badf36a6daf8f033e95f86cb59faa9779674688 # Parent 15371ff780a0f77fefb6257e5908545014ab6d0c mcRBM - added notes to leading comment diff -r 15371ff780a0 -r 5badf36a6daf pylearn/algorithms/mcRBM.py --- a/pylearn/algorithms/mcRBM.py Mon Aug 23 16:06:23 2010 -0400 +++ b/pylearn/algorithms/mcRBM.py Tue Aug 24 13:50:26 2010 -0400 @@ -5,7 +5,10 @@ Modeling pixel means and covariances using factored third-order Boltzmann machines. IEEE Conference on Computer Vision and Pattern Recognition. -and performs one of the experiments on CIFAR-10 discussed in that paper. +and performs one of the experiments on CIFAR-10 discussed in that paper. There are some minor +discrepancies between the paper and the accompanying code (train_mcRBM.py), and the +accompanying code has been taken to be correct in those cases because I couldn't get things to +work otherwise. Math @@ -24,25 +27,71 @@ -Full Energy of mean and Covariance RBM, with +Version in paper +---------------- + +Full Energy of the Mean and Covariance RBM, with :math:`h_k = h_k^{(c)}`, :math:`g_j = h_j^{(m)}`, :math:`b_k = b_k^{(c)}`, :math:`c_j = b_j^{(m)}`, :math:`U_{if} = C_{if}`, -: + E (v, h, g) = + - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2 + - \sum_k b_k h_k + + 0.5 \sum_i v_i^2 + - \sum_j \sum_i W_{ij} g_j v_i + - \sum_j c_j g_j + +For the energy function to correspond to a probability distribution, P must be non-positive. P +is initialized to be a diagonal, and in our experience it can be left as such because even in +the paper it has a very low learning rate, and is only allowed to be updated after the filters +in U are learned (in effect). + +Version in published train_mcRBM code +------------------------------------- + +The train_mcRBM file implements learning in a similar but technically different Energy function: E (v, h, g) = - - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2 + - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 - \sum_k b_k h_k + 0.5 \sum_i v_i^2 - \sum_j \sum_i W_{ij} g_j v_i - \sum_j c_j g_j -For the energy function to correspond to a probability distribution, P must be non-positive. +There are two differences with respect to the paper: + + - 'v' is not normalized by its length, but rather it is normalized to have length close to + the square root of the number of its components. The variable called 'small' that + "avoids division by zero" is orders larger than machine precision, and is on the order of + the normalized sum-of-squares, so I've included it in the Energy function. + + - 'U' is also not normalized by its length. U is initialized to have columns that are + shorter than unit-length (approximately 0.2 with the 105 principle components in the + train_mcRBM data). During training, the columns of U are constrained manually to have + equal lengths (see the use of normVF), but Euclidean norm is allowed to change. During + learning it quickly converges towards 1 and then exceeds 1. It does not seem like this + column-wise normalization of U is justified by maximum-likelihood, I have no intuition + for why it is used. +Version in this code +-------------------- + +This file implements the same algorithm as the train_mcRBM code, except that the P matrix is +omitted for clarity, and replaced analytically with a negative identity matrix. + + E (v, h, g) = + + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 + - \sum_k b_k h_k + + 0.5 \sum_i v_i^2 + - \sum_j \sum_i W_{ij} g_j v_i + - \sum_j c_j g_j + + + Conventions in this file ======================== @@ -64,10 +113,15 @@ - `b`, a vector of hidden covariance biases (K) - `c`, a vector of hidden mean biases (J) -Matrices are generally layed out according to a C-order convention. +Matrices are generally layed out and accessed according to a C-order convention. """ +# +# WORKING NOTES +# THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION +# NOT THE ENERGY FUNCTION IN THE CODE!!! +# # Free energy is the marginal energy of visible units # Recall: # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x)