Mercurial > pylearn
comparison pylearn/algorithms/mcRBM.py @ 984:5badf36a6daf
mcRBM - added notes to leading comment
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Tue, 24 Aug 2010 13:50:26 -0400 |
parents | 2a53384d9742 |
children | 78b5bdf967f6 |
comparison
equal
deleted
inserted
replaced
983:15371ff780a0 | 984:5badf36a6daf |
---|---|
3 | 3 |
4 Ranzato, M. and Hinton, G. E. (2010) | 4 Ranzato, M. and Hinton, G. E. (2010) |
5 Modeling pixel means and covariances using factored third-order Boltzmann machines. | 5 Modeling pixel means and covariances using factored third-order Boltzmann machines. |
6 IEEE Conference on Computer Vision and Pattern Recognition. | 6 IEEE Conference on Computer Vision and Pattern Recognition. |
7 | 7 |
8 and performs one of the experiments on CIFAR-10 discussed in that paper. | 8 and performs one of the experiments on CIFAR-10 discussed in that paper. There are some minor |
9 discrepancies between the paper and the accompanying code (train_mcRBM.py), and the | |
10 accompanying code has been taken to be correct in those cases because I couldn't get things to | |
11 work otherwise. | |
9 | 12 |
10 | 13 |
11 Math | 14 Math |
12 ==== | 15 ==== |
13 | 16 |
22 | 25 |
23 E = \sum_f h_f ( \sum_i C_{if} v_i )^2 | 26 E = \sum_f h_f ( \sum_i C_{if} v_i )^2 |
24 | 27 |
25 | 28 |
26 | 29 |
27 Full Energy of mean and Covariance RBM, with | 30 Version in paper |
31 ---------------- | |
32 | |
33 Full Energy of the Mean and Covariance RBM, with | |
28 :math:`h_k = h_k^{(c)}`, | 34 :math:`h_k = h_k^{(c)}`, |
29 :math:`g_j = h_j^{(m)}`, | 35 :math:`g_j = h_j^{(m)}`, |
30 :math:`b_k = b_k^{(c)}`, | 36 :math:`b_k = b_k^{(c)}`, |
31 :math:`c_j = b_j^{(m)}`, | 37 :math:`c_j = b_j^{(m)}`, |
32 :math:`U_{if} = C_{if}`, | 38 :math:`U_{if} = C_{if}`, |
33 | 39 |
34 : | |
35 | |
36 E (v, h, g) = | 40 E (v, h, g) = |
37 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2 | 41 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2 |
38 - \sum_k b_k h_k | 42 - \sum_k b_k h_k |
39 + 0.5 \sum_i v_i^2 | 43 + 0.5 \sum_i v_i^2 |
40 - \sum_j \sum_i W_{ij} g_j v_i | 44 - \sum_j \sum_i W_{ij} g_j v_i |
41 - \sum_j c_j g_j | 45 - \sum_j c_j g_j |
42 | 46 |
43 For the energy function to correspond to a probability distribution, P must be non-positive. | 47 For the energy function to correspond to a probability distribution, P must be non-positive. P |
44 | 48 is initialized to be a diagonal, and in our experience it can be left as such because even in |
49 the paper it has a very low learning rate, and is only allowed to be updated after the filters | |
50 in U are learned (in effect). | |
51 | |
52 Version in published train_mcRBM code | |
53 ------------------------------------- | |
54 | |
55 The train_mcRBM file implements learning in a similar but technically different Energy function: | |
56 | |
57 E (v, h, g) = | |
58 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 | |
59 - \sum_k b_k h_k | |
60 + 0.5 \sum_i v_i^2 | |
61 - \sum_j \sum_i W_{ij} g_j v_i | |
62 - \sum_j c_j g_j | |
63 | |
64 There are two differences with respect to the paper: | |
65 | |
66 - 'v' is not normalized by its length, but rather it is normalized to have length close to | |
67 the square root of the number of its components. The variable called 'small' that | |
68 "avoids division by zero" is orders larger than machine precision, and is on the order of | |
69 the normalized sum-of-squares, so I've included it in the Energy function. | |
70 | |
71 - 'U' is also not normalized by its length. U is initialized to have columns that are | |
72 shorter than unit-length (approximately 0.2 with the 105 principle components in the | |
73 train_mcRBM data). During training, the columns of U are constrained manually to have | |
74 equal lengths (see the use of normVF), but Euclidean norm is allowed to change. During | |
75 learning it quickly converges towards 1 and then exceeds 1. It does not seem like this | |
76 column-wise normalization of U is justified by maximum-likelihood, I have no intuition | |
77 for why it is used. | |
78 | |
79 | |
80 Version in this code | |
81 -------------------- | |
82 | |
83 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is | |
84 omitted for clarity, and replaced analytically with a negative identity matrix. | |
85 | |
86 E (v, h, g) = | |
87 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 | |
88 - \sum_k b_k h_k | |
89 + 0.5 \sum_i v_i^2 | |
90 - \sum_j \sum_i W_{ij} g_j v_i | |
91 - \sum_j c_j g_j | |
92 | |
93 | |
45 | 94 |
46 Conventions in this file | 95 Conventions in this file |
47 ======================== | 96 ======================== |
48 | 97 |
49 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little | 98 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little |
62 - `U`, a matrix whose rows are visible covariance directions (I x F) | 111 - `U`, a matrix whose rows are visible covariance directions (I x F) |
63 - `W`, a matrix whose rows are visible mean directions (I x J) | 112 - `W`, a matrix whose rows are visible mean directions (I x J) |
64 - `b`, a vector of hidden covariance biases (K) | 113 - `b`, a vector of hidden covariance biases (K) |
65 - `c`, a vector of hidden mean biases (J) | 114 - `c`, a vector of hidden mean biases (J) |
66 | 115 |
67 Matrices are generally layed out according to a C-order convention. | 116 Matrices are generally layed out and accessed according to a C-order convention. |
68 | 117 |
69 """ | 118 """ |
70 | 119 |
120 # | |
121 # WORKING NOTES | |
122 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION | |
123 # NOT THE ENERGY FUNCTION IN THE CODE!!! | |
124 # | |
71 # Free energy is the marginal energy of visible units | 125 # Free energy is the marginal energy of visible units |
72 # Recall: | 126 # Recall: |
73 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x) | 127 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x) |
74 # | 128 # |
75 # | 129 # |