Mercurial > pylearn
comparison pylearn/algorithms/mcRBM.py @ 1500:517f4c02dde9
Auto white space fix.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Fri, 09 Sep 2011 10:50:32 -0400 |
parents | f82b80c841b2 |
children | 2a6a6f16416c |
comparison
equal
deleted
inserted
replaced
1499:f82b80c841b2 | 1500:517f4c02dde9 |
---|---|
1 """ | 1 """ |
2 This file implements the Mean & Covariance RBM discussed in | 2 This file implements the Mean & Covariance RBM discussed in |
3 | 3 |
4 Ranzato, M. and Hinton, G. E. (2010) | 4 Ranzato, M. and Hinton, G. E. (2010) |
5 Modeling pixel means and covariances using factored third-order Boltzmann machines. | 5 Modeling pixel means and covariances using factored third-order Boltzmann machines. |
6 IEEE Conference on Computer Vision and Pattern Recognition. | 6 IEEE Conference on Computer Vision and Pattern Recognition. |
7 | 7 |
28 | 28 |
29 | 29 |
30 Version in paper | 30 Version in paper |
31 ---------------- | 31 ---------------- |
32 | 32 |
33 Full Energy of the Mean and Covariance RBM, with | 33 Full Energy of the Mean and Covariance RBM, with |
34 :math:`h_k = h_k^{(c)}`, | 34 :math:`h_k = h_k^{(c)}`, |
35 :math:`g_j = h_j^{(m)}`, | 35 :math:`g_j = h_j^{(m)}`, |
36 :math:`b_k = b_k^{(c)}`, | 36 :math:`b_k = b_k^{(c)}`, |
37 :math:`c_j = b_j^{(m)}`, | 37 :math:`c_j = b_j^{(m)}`, |
38 :math:`U_{if} = C_{if}`, | 38 :math:`U_{if} = C_{if}`, |
39 | 39 |
40 E (v, h, g) = | 40 E (v, h, g) = |
41 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2 | 41 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2 |
42 - \sum_k b_k h_k | 42 - \sum_k b_k h_k |
43 + 0.5 \sum_i v_i^2 | 43 + 0.5 \sum_i v_i^2 |
44 - \sum_j \sum_i W_{ij} g_j v_i | 44 - \sum_j \sum_i W_{ij} g_j v_i |
45 - \sum_j c_j g_j | 45 - \sum_j c_j g_j |
46 | 46 |
53 ------------------------------------- | 53 ------------------------------------- |
54 | 54 |
55 The train_mcRBM file implements learning in a similar but technically different Energy function: | 55 The train_mcRBM file implements learning in a similar but technically different Energy function: |
56 | 56 |
57 E (v, h, g) = | 57 E (v, h, g) = |
58 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 | 58 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 |
59 - \sum_k b_k h_k | 59 - \sum_k b_k h_k |
60 + 0.5 \sum_i v_i^2 | 60 + 0.5 \sum_i v_i^2 |
61 - \sum_j \sum_i W_{ij} g_j v_i | 61 - \sum_j \sum_i W_{ij} g_j v_i |
62 - \sum_j c_j g_j | 62 - \sum_j c_j g_j |
63 | 63 |
82 | 82 |
83 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is | 83 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is |
84 omitted for clarity, and replaced analytically with a negative identity matrix. | 84 omitted for clarity, and replaced analytically with a negative identity matrix. |
85 | 85 |
86 E (v, h, g) = | 86 E (v, h, g) = |
87 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 | 87 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 |
88 - \sum_k b_k h_k | 88 - \sum_k b_k h_k |
89 + 0.5 \sum_i v_i^2 | 89 + 0.5 \sum_i v_i^2 |
90 - \sum_j \sum_i W_{ij} g_j v_i | 90 - \sum_j \sum_i W_{ij} g_j v_i |
91 - \sum_j c_j g_j | 91 - \sum_j c_j g_j |
92 | 92 |
93 E (v, h, g) = | 93 E (v, h, g) = |
94 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 | 94 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 |
95 - \sum_k b_k h_k | 95 - \sum_k b_k h_k |
96 + 0.5 \sum_i v_i^2 | 96 + 0.5 \sum_i v_i^2 |
97 - \sum_j \sum_i W_{ij} g_j v_i | 97 - \sum_j \sum_i W_{ij} g_j v_i |
98 - \sum_j c_j g_j | 98 - \sum_j c_j g_j |
99 | 99 |
100 | 100 |
101 | 101 |
102 Conventions in this file | 102 Conventions in this file |
103 ======================== | 103 ======================== |
104 | 104 |
105 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little | 105 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little |
106 more convenient. | 106 more convenient. |
107 | 107 |
108 | 108 |
109 Global functions like `free_energy` work on an mcRBM as parametrized in a particular way. | 109 Global functions like `free_energy` work on an mcRBM as parametrized in a particular way. |
110 Suppose we have | 110 Suppose we have |
111 - I input dimensions, | 111 - I input dimensions, |
112 - F squared filters, | 112 - F squared filters, |
113 - J mean variables, and | 113 - J mean variables, and |
114 - K covariance variables. | 114 - K covariance variables. |
115 | 115 |
116 The mcRBM is parametrized by 6 variables: | 116 The mcRBM is parametrized by 6 variables: |
117 | 117 |
129 # WORKING NOTES | 129 # WORKING NOTES |
130 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION | 130 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION |
131 # NOT THE ENERGY FUNCTION IN THE CODE!!! | 131 # NOT THE ENERGY FUNCTION IN THE CODE!!! |
132 # | 132 # |
133 # Free energy is the marginal energy of visible units | 133 # Free energy is the marginal energy of visible units |
134 # Recall: | 134 # Recall: |
135 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x) | 135 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x) |
136 # | 136 # |
137 # | 137 # |
138 # E (v, h, g) = | 138 # E (v, h, g) = |
139 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2 | 139 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2 |
152 # = -\log( \sum_{h,g} exp(- | 152 # = -\log( \sum_{h,g} exp(- |
153 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) | 153 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) |
154 # - \sum_k b_k h_k | 154 # - \sum_k b_k h_k |
155 # + 0.5 \sum_i v_i^2 | 155 # + 0.5 \sum_i v_i^2 |
156 # - \sum_j \sum_i W_{ij} g_j v_i | 156 # - \sum_j \sum_i W_{ij} g_j v_i |
157 # - \sum_j c_j g_j | 157 # - \sum_j c_j g_j |
158 # - \sum_i a_i v_i )) | 158 # - \sum_i a_i v_i )) |
159 # | 159 # |
160 # Get rid of double negs in exp | 160 # Get rid of double negs in exp |
161 # = -\log( \sum_{h} exp( | 161 # = -\log( \sum_{h} exp( |
162 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) | 162 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) |
163 # + \sum_k b_k h_k | 163 # + \sum_k b_k h_k |
164 # - 0.5 \sum_i v_i^2 | 164 # - 0.5 \sum_i v_i^2 |
165 # ) * \sum_{g} exp( | 165 # ) * \sum_{g} exp( |
166 # + \sum_j \sum_i W_{ij} g_j v_i | 166 # + \sum_j \sum_i W_{ij} g_j v_i |
167 # + \sum_j c_j g_j)) | 167 # + \sum_j c_j g_j)) |
168 # - \sum_i a_i v_i | 168 # - \sum_i a_i v_i |
169 # | 169 # |
170 # Break up log | 170 # Break up log |
171 # = -\log( \sum_{h} exp( | 171 # = -\log( \sum_{h} exp( |
172 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|) | 172 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|) |
173 # + \sum_k b_k h_k | 173 # + \sum_k b_k h_k |
174 # )) | 174 # )) |
175 # -\log( \sum_{g} exp( | 175 # -\log( \sum_{g} exp( |
176 # + \sum_j \sum_i W_{ij} g_j v_i | 176 # + \sum_j \sum_i W_{ij} g_j v_i |
177 # + \sum_j c_j g_j ))) | 177 # + \sum_j c_j g_j ))) |
178 # + 0.5 \sum_i v_i^2 | 178 # + 0.5 \sum_i v_i^2 |
179 # - \sum_i a_i v_i | 179 # - \sum_i a_i v_i |
180 # | 180 # |
181 # Use domain h is binary to turn log(sum(exp(sum...))) into sum(log(.. | 181 # Use domain h is binary to turn log(sum(exp(sum...))) into sum(log(.. |
182 # = -\log(\sum_{h} exp( | 182 # = -\log(\sum_{h} exp( |
183 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|* |v|) | 183 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|* |v|) |
184 # + \sum_k b_k h_k | 184 # + \sum_k b_k h_k |
185 # )) | 185 # )) |
186 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) | 186 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) |
187 # + 0.5 \sum_i v_i^2 | 187 # + 0.5 \sum_i v_i^2 |
188 # - \sum_i a_i v_i | 188 # - \sum_i a_i v_i |
189 # | 189 # |
190 # = - \sum_{k} \log(1 + exp(b_k + 0.5 \sum_f P_{fk}( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|))) | 190 # = - \sum_{k} \log(1 + exp(b_k + 0.5 \sum_f P_{fk}( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|))) |
191 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) | 191 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) |
192 # + 0.5 \sum_i v_i^2 | 192 # + 0.5 \sum_i v_i^2 |
193 # - \sum_i a_i v_i | 193 # - \sum_i a_i v_i |
194 # | 194 # |
195 # For negative-one-diagonal P this gives: | 195 # For negative-one-diagonal P this gives: |
196 # | 196 # |
197 # = - \sum_{k} \log(1 + exp(b_k - 0.5 \sum_i (U_{ik} v_i )^2 / (|U_{*k}|*|v|))) | 197 # = - \sum_{k} \log(1 + exp(b_k - 0.5 \sum_i (U_{ik} v_i )^2 / (|U_{*k}|*|v|))) |
198 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) | 198 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) |
199 # + 0.5 \sum_i v_i^2 | 199 # + 0.5 \sum_i v_i^2 |
200 # - \sum_i a_i v_i | 200 # - \sum_i a_i v_i |
201 | 201 |
202 import sys, os, logging | 202 import sys, os, logging |
203 import numpy as np | 203 import numpy as np |
204 import numpy | 204 import numpy |
205 | 205 |
359 def expected_h_g_given_v(self, v): | 359 def expected_h_g_given_v(self, v): |
360 """Returns tuple (`h`, `g`) of theano expression conditional expectations in an mcRBM. | 360 """Returns tuple (`h`, `g`) of theano expression conditional expectations in an mcRBM. |
361 | 361 |
362 `h` is the conditional on the covariance units. | 362 `h` is the conditional on the covariance units. |
363 `g` is the conditional on the mean units. | 363 `g` is the conditional on the mean units. |
364 | 364 |
365 """ | 365 """ |
366 h = TT.nnet.sigmoid(self.hidden_cov_units_preactivation_given_v(v)) | 366 h = TT.nnet.sigmoid(self.hidden_cov_units_preactivation_given_v(v)) |
367 g = TT.nnet.sigmoid(self.c + dot(v,self.W)) | 367 g = TT.nnet.sigmoid(self.c + dot(v,self.W)) |
368 return (h, g) | 368 return (h, g) |
369 | 369 |
370 def n_visible_units(self): | 370 def n_visible_units(self): |
371 """Return the number of visible units of this RBM | 371 """Return the number of visible units of this RBM |
372 | 372 |
373 For an RBM made from shared variables, this will return an integer, | 373 For an RBM made from shared variables, this will return an integer, |
374 for a purely symbolic RBM this will return a theano expression. | 374 for a purely symbolic RBM this will return a theano expression. |
375 | 375 |
376 """ | 376 """ |
377 try: | 377 try: |
378 return self.W.get_value(borrow=True, return_internal_type=True).shape[0] | 378 return self.W.get_value(borrow=True, return_internal_type=True).shape[0] |
379 except AttributeError: | 379 except AttributeError: |
380 return self.W.shape[0] | 380 return self.W.shape[0] |
382 def n_hidden_cov_units(self): | 382 def n_hidden_cov_units(self): |
383 """Return the number of hidden units for the covariance in this RBM | 383 """Return the number of hidden units for the covariance in this RBM |
384 | 384 |
385 For an RBM made from shared variables, this will return an integer, | 385 For an RBM made from shared variables, this will return an integer, |
386 for a purely symbolic RBM this will return a theano expression. | 386 for a purely symbolic RBM this will return a theano expression. |
387 | 387 |
388 """ | 388 """ |
389 try: | 389 try: |
390 return self.U.get_value(borrow=True, return_internal_type=True).shape[1] | 390 return self.U.get_value(borrow=True, return_internal_type=True).shape[1] |
391 except AttributeError: | 391 except AttributeError: |
392 return self.U.shape[1] | 392 return self.U.shape[1] |
394 def n_hidden_mean_units(self): | 394 def n_hidden_mean_units(self): |
395 """Return the number of hidden units for the mean in this RBM | 395 """Return the number of hidden units for the mean in this RBM |
396 | 396 |
397 For an RBM made from shared variables, this will return an integer, | 397 For an RBM made from shared variables, this will return an integer, |
398 for a purely symbolic RBM this will return a theano expression. | 398 for a purely symbolic RBM this will return a theano expression. |
399 | 399 |
400 """ | 400 """ |
401 try: | 401 try: |
402 return self.W.get_value(borrow=True, return_internal_type=True).shape[1] | 402 return self.W.get_value(borrow=True, return_internal_type=True).shape[1] |
403 except AttributeError: | 403 except AttributeError: |
404 return self.W.shape[1] | 404 return self.W.shape[1] |
471 def params(self): | 471 def params(self): |
472 """Return the elements of [U,W,a,b,c] that are shared variables | 472 """Return the elements of [U,W,a,b,c] that are shared variables |
473 | 473 |
474 WRITEME : a *prescriptive* definition of this method suitable for mention in the API | 474 WRITEME : a *prescriptive* definition of this method suitable for mention in the API |
475 doc. | 475 doc. |
476 | 476 |
477 """ | 477 """ |
478 return list(self._params) | 478 return list(self._params) |
479 | 479 |
480 @classmethod | 480 @classmethod |
481 def alloc(cls, n_I, n_K, n_J, rng = 8923402190, | 481 def alloc(cls, n_I, n_K, n_J, rng = 8923402190, |
489 | 489 |
490 :param n_I: input dimensionality | 490 :param n_I: input dimensionality |
491 :param n_K: number of covariance hidden units | 491 :param n_K: number of covariance hidden units |
492 :param n_J: number of mean filters (linear) | 492 :param n_J: number of mean filters (linear) |
493 :param rng: seed or numpy RandomState object to initialize parameters | 493 :param rng: seed or numpy RandomState object to initialize parameters |
494 | 494 |
495 :note: | 495 :note: |
496 Constants for initial ranges and values taken from train_mcRBM.py. | 496 Constants for initial ranges and values taken from train_mcRBM.py. |
497 """ | 497 """ |
498 if not hasattr(rng, 'randn'): | 498 if not hasattr(rng, 'randn'): |
499 rng = np.random.RandomState(rng) | 499 rng = np.random.RandomState(rng) |
575 def n_hidden_cov_units(self): | 575 def n_hidden_cov_units(self): |
576 """Return the number of hidden units for the covariance in this RBM | 576 """Return the number of hidden units for the covariance in this RBM |
577 | 577 |
578 For an RBM made from shared variables, this will return an integer, | 578 For an RBM made from shared variables, this will return an integer, |
579 for a purely symbolic RBM this will return a theano expression. | 579 for a purely symbolic RBM this will return a theano expression. |
580 | 580 |
581 """ | 581 """ |
582 try: | 582 try: |
583 return self.P.get_value(borrow=True, return_internal_type=True).shape[1] | 583 return self.P.get_value(borrow=True, return_internal_type=True).shape[1] |
584 except AttributeError: | 584 except AttributeError: |
585 return self.P.shape[1] | 585 return self.P.shape[1] |
591 | 591 |
592 :param n_I: input dimensionality | 592 :param n_I: input dimensionality |
593 :param n_K: number of covariance hidden units | 593 :param n_K: number of covariance hidden units |
594 :param n_J: number of mean filters (linear) | 594 :param n_J: number of mean filters (linear) |
595 :param rng: seed or numpy RandomState object to initialize parameters | 595 :param rng: seed or numpy RandomState object to initialize parameters |
596 | 596 |
597 :note: | 597 :note: |
598 Constants for initial ranges and values taken from train_mcRBM.py. | 598 Constants for initial ranges and values taken from train_mcRBM.py. |
599 """ | 599 """ |
600 return cls.alloc_with_P( | 600 return cls.alloc_with_P( |
601 -numpy.eye((n_K, n_K)).astype(theano.config.floatX), | 601 -numpy.eye((n_K, n_K)).astype(theano.config.floatX), |
633 norm_doctoring=norm_doctoring) | 633 norm_doctoring=norm_doctoring) |
634 rval._params = [rval.U, rval.W, rval.a, rval.b, rval.c, rval.P] | 634 rval._params = [rval.U, rval.W, rval.a, rval.b, rval.c, rval.P] |
635 return rval | 635 return rval |
636 | 636 |
637 class mcRBMTrainer(object): | 637 class mcRBMTrainer(object): |
638 """Light-weight class encapsulating math for mcRBM training | 638 """Light-weight class encapsulating math for mcRBM training |
639 | 639 |
640 Attributes: | 640 Attributes: |
641 - rbm - an mcRBM instance | 641 - rbm - an mcRBM instance |
642 - sampler - an HMC_sampler instance | 642 - sampler - an HMC_sampler instance |
643 - normVF - geometrically updated norm of U matrix columns (shared var) | 643 - normVF - geometrically updated norm of U matrix columns (shared var) |
734 | 734 |
735 def normalize_U(self, new_U): | 735 def normalize_U(self, new_U): |
736 """ | 736 """ |
737 :param new_U: a proposed new value for rbm.U | 737 :param new_U: a proposed new value for rbm.U |
738 | 738 |
739 :returns: a pair of TensorType variables: | 739 :returns: a pair of TensorType variables: |
740 a corrected new value for U, and a new value for self.normVF | 740 a corrected new value for U, and a new value for self.normVF |
741 | 741 |
742 This is a weird normalization procedure, but the sample code for the paper has it, and | 742 This is a weird normalization procedure, but the sample code for the paper has it, and |
743 it seems to be important. | 743 it seems to be important. |
744 """ | 744 """ |
750 """Return the contrastive divergence gradients on the parameters of self.rbm """ | 750 """Return the contrastive divergence gradients on the parameters of self.rbm """ |
751 if neg_v is None: | 751 if neg_v is None: |
752 neg_v = self.sampler.positions | 752 neg_v = self.sampler.positions |
753 return contrastive_grad( | 753 return contrastive_grad( |
754 free_energy_fn=self.rbm.free_energy_given_v, | 754 free_energy_fn=self.rbm.free_energy_given_v, |
755 pos_v=self.visible_batch, | 755 pos_v=self.visible_batch, |
756 neg_v=neg_v, | 756 neg_v=neg_v, |
757 wrt = self.rbm.params(), | 757 wrt = self.rbm.params(), |
758 other_cost=(l1(self.rbm.U)+l1(self.rbm.W)) * self.effective_l1_penalty) | 758 other_cost=(l1(self.rbm.U)+l1(self.rbm.W)) * self.effective_l1_penalty) |
759 | 759 |
760 def cd_updates(self): | 760 def cd_updates(self): |
784 | 784 |
785 # TODO: when sgd has an annealing schedule, this should | 785 # TODO: when sgd has an annealing schedule, this should |
786 # go through that mechanism. | 786 # go through that mechanism. |
787 | 787 |
788 lr = TT.clip( | 788 lr = TT.clip( |
789 self.learn_rate * TT.cast(self.lr_anneal_start / (self.iter+1), floatX), | 789 self.learn_rate * TT.cast(self.lr_anneal_start / (self.iter+1), floatX), |
790 0.0, #min | 790 0.0, #min |
791 self.learn_rate) #max | 791 self.learn_rate) #max |
792 | 792 |
793 ups.update(dict(sgd_updates( | 793 ups.update(dict(sgd_updates( |
794 self.rbm.params(), | 794 self.rbm.params(), |
815 new_P = ups[self.rbm.P] * self.p_mask | 815 new_P = ups[self.rbm.P] * self.p_mask |
816 no_pos_P = TT.switch(new_P<0, new_P, 0) | 816 no_pos_P = TT.switch(new_P<0, new_P, 0) |
817 ups[self.rbm.P] = - no_pos_P / no_pos_P.sum(axis=0) #normalize to that columns sum 1 | 817 ups[self.rbm.P] = - no_pos_P / no_pos_P.sum(axis=0) #normalize to that columns sum 1 |
818 | 818 |
819 return ups | 819 return ups |
820 |