comparison pylearn/algorithms/mcRBM.py @ 1500:517f4c02dde9

Auto white space fix.
author Frederic Bastien <nouiz@nouiz.org>
date Fri, 09 Sep 2011 10:50:32 -0400
parents f82b80c841b2
children 2a6a6f16416c
comparison
equal deleted inserted replaced
1499:f82b80c841b2 1500:517f4c02dde9
1 """ 1 """
2 This file implements the Mean & Covariance RBM discussed in 2 This file implements the Mean & Covariance RBM discussed in
3 3
4 Ranzato, M. and Hinton, G. E. (2010) 4 Ranzato, M. and Hinton, G. E. (2010)
5 Modeling pixel means and covariances using factored third-order Boltzmann machines. 5 Modeling pixel means and covariances using factored third-order Boltzmann machines.
6 IEEE Conference on Computer Vision and Pattern Recognition. 6 IEEE Conference on Computer Vision and Pattern Recognition.
7 7
28 28
29 29
30 Version in paper 30 Version in paper
31 ---------------- 31 ----------------
32 32
33 Full Energy of the Mean and Covariance RBM, with 33 Full Energy of the Mean and Covariance RBM, with
34 :math:`h_k = h_k^{(c)}`, 34 :math:`h_k = h_k^{(c)}`,
35 :math:`g_j = h_j^{(m)}`, 35 :math:`g_j = h_j^{(m)}`,
36 :math:`b_k = b_k^{(c)}`, 36 :math:`b_k = b_k^{(c)}`,
37 :math:`c_j = b_j^{(m)}`, 37 :math:`c_j = b_j^{(m)}`,
38 :math:`U_{if} = C_{if}`, 38 :math:`U_{if} = C_{if}`,
39 39
40 E (v, h, g) = 40 E (v, h, g) =
41 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2 41 - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i (U_{if} v_i) / |U_{.f}|*|v| )^2
42 - \sum_k b_k h_k 42 - \sum_k b_k h_k
43 + 0.5 \sum_i v_i^2 43 + 0.5 \sum_i v_i^2
44 - \sum_j \sum_i W_{ij} g_j v_i 44 - \sum_j \sum_i W_{ij} g_j v_i
45 - \sum_j c_j g_j 45 - \sum_j c_j g_j
46 46
53 ------------------------------------- 53 -------------------------------------
54 54
55 The train_mcRBM file implements learning in a similar but technically different Energy function: 55 The train_mcRBM file implements learning in a similar but technically different Energy function:
56 56
57 E (v, h, g) = 57 E (v, h, g) =
58 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 58 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
59 - \sum_k b_k h_k 59 - \sum_k b_k h_k
60 + 0.5 \sum_i v_i^2 60 + 0.5 \sum_i v_i^2
61 - \sum_j \sum_i W_{ij} g_j v_i 61 - \sum_j \sum_i W_{ij} g_j v_i
62 - \sum_j c_j g_j 62 - \sum_j c_j g_j
63 63
82 82
83 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is 83 This file implements the same algorithm as the train_mcRBM code, except that the P matrix is
84 omitted for clarity, and replaced analytically with a negative identity matrix. 84 omitted for clarity, and replaced analytically with a negative identity matrix.
85 85
86 E (v, h, g) = 86 E (v, h, g) =
87 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 87 + 0.5 \sum_k h_k (\sum_i U_{ik} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
88 - \sum_k b_k h_k 88 - \sum_k b_k h_k
89 + 0.5 \sum_i v_i^2 89 + 0.5 \sum_i v_i^2
90 - \sum_j \sum_i W_{ij} g_j v_i 90 - \sum_j \sum_i W_{ij} g_j v_i
91 - \sum_j c_j g_j 91 - \sum_j c_j g_j
92 92
93 E (v, h, g) = 93 E (v, h, g) =
94 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2 94 - 0.5 \sum_f \sum_k P_{fk} h_k (\sum_i U_{if} v_i / sqrt(\sum_i v_i^2/I + 0.5))^2
95 - \sum_k b_k h_k 95 - \sum_k b_k h_k
96 + 0.5 \sum_i v_i^2 96 + 0.5 \sum_i v_i^2
97 - \sum_j \sum_i W_{ij} g_j v_i 97 - \sum_j \sum_i W_{ij} g_j v_i
98 - \sum_j c_j g_j 98 - \sum_j c_j g_j
99 99
100 100
101 101
102 Conventions in this file 102 Conventions in this file
103 ======================== 103 ========================
104 104
105 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little 105 This file contains some global functions, as well as a class (MeanCovRBM) that makes using them a little
106 more convenient. 106 more convenient.
107 107
108 108
109 Global functions like `free_energy` work on an mcRBM as parametrized in a particular way. 109 Global functions like `free_energy` work on an mcRBM as parametrized in a particular way.
110 Suppose we have 110 Suppose we have
111 - I input dimensions, 111 - I input dimensions,
112 - F squared filters, 112 - F squared filters,
113 - J mean variables, and 113 - J mean variables, and
114 - K covariance variables. 114 - K covariance variables.
115 115
116 The mcRBM is parametrized by 6 variables: 116 The mcRBM is parametrized by 6 variables:
117 117
129 # WORKING NOTES 129 # WORKING NOTES
130 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION 130 # THIS DERIVATION IS BASED ON THE ** PAPER ** ENERGY FUNCTION
131 # NOT THE ENERGY FUNCTION IN THE CODE!!! 131 # NOT THE ENERGY FUNCTION IN THE CODE!!!
132 # 132 #
133 # Free energy is the marginal energy of visible units 133 # Free energy is the marginal energy of visible units
134 # Recall: 134 # Recall:
135 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x) 135 # Q(x) = exp(-E(x))/Z ==> -log(Q(x)) - log(Z) = E(x)
136 # 136 #
137 # 137 #
138 # E (v, h, g) = 138 # E (v, h, g) =
139 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2 139 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / |U_{*f}|^2 |v|^2
152 # = -\log( \sum_{h,g} exp(- 152 # = -\log( \sum_{h,g} exp(-
153 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) 153 # - 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|)
154 # - \sum_k b_k h_k 154 # - \sum_k b_k h_k
155 # + 0.5 \sum_i v_i^2 155 # + 0.5 \sum_i v_i^2
156 # - \sum_j \sum_i W_{ij} g_j v_i 156 # - \sum_j \sum_i W_{ij} g_j v_i
157 # - \sum_j c_j g_j 157 # - \sum_j c_j g_j
158 # - \sum_i a_i v_i )) 158 # - \sum_i a_i v_i ))
159 # 159 #
160 # Get rid of double negs in exp 160 # Get rid of double negs in exp
161 # = -\log( \sum_{h} exp( 161 # = -\log( \sum_{h} exp(
162 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|) 162 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}| * |v|)
163 # + \sum_k b_k h_k 163 # + \sum_k b_k h_k
164 # - 0.5 \sum_i v_i^2 164 # - 0.5 \sum_i v_i^2
165 # ) * \sum_{g} exp( 165 # ) * \sum_{g} exp(
166 # + \sum_j \sum_i W_{ij} g_j v_i 166 # + \sum_j \sum_i W_{ij} g_j v_i
167 # + \sum_j c_j g_j)) 167 # + \sum_j c_j g_j))
168 # - \sum_i a_i v_i 168 # - \sum_i a_i v_i
169 # 169 #
170 # Break up log 170 # Break up log
171 # = -\log( \sum_{h} exp( 171 # = -\log( \sum_{h} exp(
172 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|) 172 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|)
173 # + \sum_k b_k h_k 173 # + \sum_k b_k h_k
174 # )) 174 # ))
175 # -\log( \sum_{g} exp( 175 # -\log( \sum_{g} exp(
176 # + \sum_j \sum_i W_{ij} g_j v_i 176 # + \sum_j \sum_i W_{ij} g_j v_i
177 # + \sum_j c_j g_j ))) 177 # + \sum_j c_j g_j )))
178 # + 0.5 \sum_i v_i^2 178 # + 0.5 \sum_i v_i^2
179 # - \sum_i a_i v_i 179 # - \sum_i a_i v_i
180 # 180 #
181 # Use domain h is binary to turn log(sum(exp(sum...))) into sum(log(.. 181 # Use domain h is binary to turn log(sum(exp(sum...))) into sum(log(..
182 # = -\log(\sum_{h} exp( 182 # = -\log(\sum_{h} exp(
183 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|* |v|) 183 # + 0.5 \sum_f \sum_k P_{fk} h_k ( \sum_i U_{if} v_i )^2 / (|U_{*f}|* |v|)
184 # + \sum_k b_k h_k 184 # + \sum_k b_k h_k
185 # )) 185 # ))
186 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) 186 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
187 # + 0.5 \sum_i v_i^2 187 # + 0.5 \sum_i v_i^2
188 # - \sum_i a_i v_i 188 # - \sum_i a_i v_i
189 # 189 #
190 # = - \sum_{k} \log(1 + exp(b_k + 0.5 \sum_f P_{fk}( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|))) 190 # = - \sum_{k} \log(1 + exp(b_k + 0.5 \sum_f P_{fk}( \sum_i U_{if} v_i )^2 / (|U_{*f}|*|v|)))
191 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) 191 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
192 # + 0.5 \sum_i v_i^2 192 # + 0.5 \sum_i v_i^2
193 # - \sum_i a_i v_i 193 # - \sum_i a_i v_i
194 # 194 #
195 # For negative-one-diagonal P this gives: 195 # For negative-one-diagonal P this gives:
196 # 196 #
197 # = - \sum_{k} \log(1 + exp(b_k - 0.5 \sum_i (U_{ik} v_i )^2 / (|U_{*k}|*|v|))) 197 # = - \sum_{k} \log(1 + exp(b_k - 0.5 \sum_i (U_{ik} v_i )^2 / (|U_{*k}|*|v|)))
198 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j )) 198 # - \sum_{j} \log(1 + exp(\sum_i W_{ij} v_i + c_j ))
199 # + 0.5 \sum_i v_i^2 199 # + 0.5 \sum_i v_i^2
200 # - \sum_i a_i v_i 200 # - \sum_i a_i v_i
201 201
202 import sys, os, logging 202 import sys, os, logging
203 import numpy as np 203 import numpy as np
204 import numpy 204 import numpy
205 205
359 def expected_h_g_given_v(self, v): 359 def expected_h_g_given_v(self, v):
360 """Returns tuple (`h`, `g`) of theano expression conditional expectations in an mcRBM. 360 """Returns tuple (`h`, `g`) of theano expression conditional expectations in an mcRBM.
361 361
362 `h` is the conditional on the covariance units. 362 `h` is the conditional on the covariance units.
363 `g` is the conditional on the mean units. 363 `g` is the conditional on the mean units.
364 364
365 """ 365 """
366 h = TT.nnet.sigmoid(self.hidden_cov_units_preactivation_given_v(v)) 366 h = TT.nnet.sigmoid(self.hidden_cov_units_preactivation_given_v(v))
367 g = TT.nnet.sigmoid(self.c + dot(v,self.W)) 367 g = TT.nnet.sigmoid(self.c + dot(v,self.W))
368 return (h, g) 368 return (h, g)
369 369
370 def n_visible_units(self): 370 def n_visible_units(self):
371 """Return the number of visible units of this RBM 371 """Return the number of visible units of this RBM
372 372
373 For an RBM made from shared variables, this will return an integer, 373 For an RBM made from shared variables, this will return an integer,
374 for a purely symbolic RBM this will return a theano expression. 374 for a purely symbolic RBM this will return a theano expression.
375 375
376 """ 376 """
377 try: 377 try:
378 return self.W.get_value(borrow=True, return_internal_type=True).shape[0] 378 return self.W.get_value(borrow=True, return_internal_type=True).shape[0]
379 except AttributeError: 379 except AttributeError:
380 return self.W.shape[0] 380 return self.W.shape[0]
382 def n_hidden_cov_units(self): 382 def n_hidden_cov_units(self):
383 """Return the number of hidden units for the covariance in this RBM 383 """Return the number of hidden units for the covariance in this RBM
384 384
385 For an RBM made from shared variables, this will return an integer, 385 For an RBM made from shared variables, this will return an integer,
386 for a purely symbolic RBM this will return a theano expression. 386 for a purely symbolic RBM this will return a theano expression.
387 387
388 """ 388 """
389 try: 389 try:
390 return self.U.get_value(borrow=True, return_internal_type=True).shape[1] 390 return self.U.get_value(borrow=True, return_internal_type=True).shape[1]
391 except AttributeError: 391 except AttributeError:
392 return self.U.shape[1] 392 return self.U.shape[1]
394 def n_hidden_mean_units(self): 394 def n_hidden_mean_units(self):
395 """Return the number of hidden units for the mean in this RBM 395 """Return the number of hidden units for the mean in this RBM
396 396
397 For an RBM made from shared variables, this will return an integer, 397 For an RBM made from shared variables, this will return an integer,
398 for a purely symbolic RBM this will return a theano expression. 398 for a purely symbolic RBM this will return a theano expression.
399 399
400 """ 400 """
401 try: 401 try:
402 return self.W.get_value(borrow=True, return_internal_type=True).shape[1] 402 return self.W.get_value(borrow=True, return_internal_type=True).shape[1]
403 except AttributeError: 403 except AttributeError:
404 return self.W.shape[1] 404 return self.W.shape[1]
471 def params(self): 471 def params(self):
472 """Return the elements of [U,W,a,b,c] that are shared variables 472 """Return the elements of [U,W,a,b,c] that are shared variables
473 473
474 WRITEME : a *prescriptive* definition of this method suitable for mention in the API 474 WRITEME : a *prescriptive* definition of this method suitable for mention in the API
475 doc. 475 doc.
476 476
477 """ 477 """
478 return list(self._params) 478 return list(self._params)
479 479
480 @classmethod 480 @classmethod
481 def alloc(cls, n_I, n_K, n_J, rng = 8923402190, 481 def alloc(cls, n_I, n_K, n_J, rng = 8923402190,
489 489
490 :param n_I: input dimensionality 490 :param n_I: input dimensionality
491 :param n_K: number of covariance hidden units 491 :param n_K: number of covariance hidden units
492 :param n_J: number of mean filters (linear) 492 :param n_J: number of mean filters (linear)
493 :param rng: seed or numpy RandomState object to initialize parameters 493 :param rng: seed or numpy RandomState object to initialize parameters
494 494
495 :note: 495 :note:
496 Constants for initial ranges and values taken from train_mcRBM.py. 496 Constants for initial ranges and values taken from train_mcRBM.py.
497 """ 497 """
498 if not hasattr(rng, 'randn'): 498 if not hasattr(rng, 'randn'):
499 rng = np.random.RandomState(rng) 499 rng = np.random.RandomState(rng)
575 def n_hidden_cov_units(self): 575 def n_hidden_cov_units(self):
576 """Return the number of hidden units for the covariance in this RBM 576 """Return the number of hidden units for the covariance in this RBM
577 577
578 For an RBM made from shared variables, this will return an integer, 578 For an RBM made from shared variables, this will return an integer,
579 for a purely symbolic RBM this will return a theano expression. 579 for a purely symbolic RBM this will return a theano expression.
580 580
581 """ 581 """
582 try: 582 try:
583 return self.P.get_value(borrow=True, return_internal_type=True).shape[1] 583 return self.P.get_value(borrow=True, return_internal_type=True).shape[1]
584 except AttributeError: 584 except AttributeError:
585 return self.P.shape[1] 585 return self.P.shape[1]
591 591
592 :param n_I: input dimensionality 592 :param n_I: input dimensionality
593 :param n_K: number of covariance hidden units 593 :param n_K: number of covariance hidden units
594 :param n_J: number of mean filters (linear) 594 :param n_J: number of mean filters (linear)
595 :param rng: seed or numpy RandomState object to initialize parameters 595 :param rng: seed or numpy RandomState object to initialize parameters
596 596
597 :note: 597 :note:
598 Constants for initial ranges and values taken from train_mcRBM.py. 598 Constants for initial ranges and values taken from train_mcRBM.py.
599 """ 599 """
600 return cls.alloc_with_P( 600 return cls.alloc_with_P(
601 -numpy.eye((n_K, n_K)).astype(theano.config.floatX), 601 -numpy.eye((n_K, n_K)).astype(theano.config.floatX),
633 norm_doctoring=norm_doctoring) 633 norm_doctoring=norm_doctoring)
634 rval._params = [rval.U, rval.W, rval.a, rval.b, rval.c, rval.P] 634 rval._params = [rval.U, rval.W, rval.a, rval.b, rval.c, rval.P]
635 return rval 635 return rval
636 636
637 class mcRBMTrainer(object): 637 class mcRBMTrainer(object):
638 """Light-weight class encapsulating math for mcRBM training 638 """Light-weight class encapsulating math for mcRBM training
639 639
640 Attributes: 640 Attributes:
641 - rbm - an mcRBM instance 641 - rbm - an mcRBM instance
642 - sampler - an HMC_sampler instance 642 - sampler - an HMC_sampler instance
643 - normVF - geometrically updated norm of U matrix columns (shared var) 643 - normVF - geometrically updated norm of U matrix columns (shared var)
734 734
735 def normalize_U(self, new_U): 735 def normalize_U(self, new_U):
736 """ 736 """
737 :param new_U: a proposed new value for rbm.U 737 :param new_U: a proposed new value for rbm.U
738 738
739 :returns: a pair of TensorType variables: 739 :returns: a pair of TensorType variables:
740 a corrected new value for U, and a new value for self.normVF 740 a corrected new value for U, and a new value for self.normVF
741 741
742 This is a weird normalization procedure, but the sample code for the paper has it, and 742 This is a weird normalization procedure, but the sample code for the paper has it, and
743 it seems to be important. 743 it seems to be important.
744 """ 744 """
750 """Return the contrastive divergence gradients on the parameters of self.rbm """ 750 """Return the contrastive divergence gradients on the parameters of self.rbm """
751 if neg_v is None: 751 if neg_v is None:
752 neg_v = self.sampler.positions 752 neg_v = self.sampler.positions
753 return contrastive_grad( 753 return contrastive_grad(
754 free_energy_fn=self.rbm.free_energy_given_v, 754 free_energy_fn=self.rbm.free_energy_given_v,
755 pos_v=self.visible_batch, 755 pos_v=self.visible_batch,
756 neg_v=neg_v, 756 neg_v=neg_v,
757 wrt = self.rbm.params(), 757 wrt = self.rbm.params(),
758 other_cost=(l1(self.rbm.U)+l1(self.rbm.W)) * self.effective_l1_penalty) 758 other_cost=(l1(self.rbm.U)+l1(self.rbm.W)) * self.effective_l1_penalty)
759 759
760 def cd_updates(self): 760 def cd_updates(self):
784 784
785 # TODO: when sgd has an annealing schedule, this should 785 # TODO: when sgd has an annealing schedule, this should
786 # go through that mechanism. 786 # go through that mechanism.
787 787
788 lr = TT.clip( 788 lr = TT.clip(
789 self.learn_rate * TT.cast(self.lr_anneal_start / (self.iter+1), floatX), 789 self.learn_rate * TT.cast(self.lr_anneal_start / (self.iter+1), floatX),
790 0.0, #min 790 0.0, #min
791 self.learn_rate) #max 791 self.learn_rate) #max
792 792
793 ups.update(dict(sgd_updates( 793 ups.update(dict(sgd_updates(
794 self.rbm.params(), 794 self.rbm.params(),
815 new_P = ups[self.rbm.P] * self.p_mask 815 new_P = ups[self.rbm.P] * self.p_mask
816 no_pos_P = TT.switch(new_P<0, new_P, 0) 816 no_pos_P = TT.switch(new_P<0, new_P, 0)
817 ups[self.rbm.P] = - no_pos_P / no_pos_P.sum(axis=0) #normalize to that columns sum 1 817 ups[self.rbm.P] = - no_pos_P / no_pos_P.sum(axis=0) #normalize to that columns sum 1
818 818
819 return ups 819 return ups
820