annotate pylearn/algorithms/pca_online_estimator.py @ 1409:cedb48a300fc

added a pca online estimator
author Philippe Hamel <higgsbosonh@hotmail.com>
date Mon, 31 Jan 2011 12:23:20 -0500
parents
children 8f15ef656598
rev   line source
1409
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
1 # Copyright 2009 PA Manzagol (manzagop AT iro DOT umontreal DOT ca)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
2 #
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
3 # Licensed under the Apache License, Version 2.0 (the "License");
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
4 # you may not use this file except in compliance with the License.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
5 # You may obtain a copy of the License at
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
6 #
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
7 # http://www.apache.org/licenses/LICENSE-2.0
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
8 #
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
9 # Unless required by applicable law or agreed to in writing, software
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
10 # distributed under the License is distributed on an "AS IS" BASIS,
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
12 # See the License for the specific language governing permissions and
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
13 # limitations under the License.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
14 #
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
15
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
16 import numpy
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
17 from scipy import linalg
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
18
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
19 # Todo:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
20 # - complete docstring (explain arguments, pseudo code)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
21 # - Consider case with discount = 1.0
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
22 # - reevaluation when not at the end of a minibatch
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
23
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
24 class PcaOnlineEstimator(object):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
25 """Online estimation of the leading eigen values/vectors of the covariance
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
26 of some samples.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
27
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
28 Maintains a moving (with discount) low rank (n_eigen) estimate of the
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
29 covariance matrix of some observations. New observations are accumulated
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
30 until the batch is complete, at which point the low rank estimate is
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
31 reevaluated.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
32
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
33 Example:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
34
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
35 pca_esti = pca_online_estimator.PcaOnlineEstimator(dimension_of_the_samples)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
36
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
37 for i in range(number_of_samples):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
38 pca_esti.observe(samples[i])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
39
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
40 [eigvals, eigvecs] = pca_esti.getLeadingEigen()
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
41
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
42 """
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
43
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
44
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
45 def __init__(self, n_dim, n_eigen = 10, minibatch_size = 25, gamma = 0.999, regularizer = 1e-6, centering = True):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
46 # dimension of the observations
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
47 self.n_dim = n_dim
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
48 # rank of the low-rank estimate
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
49 self.n_eigen = n_eigen
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
50 # how many observations between reevaluations of the low rank estimate
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
51 self.minibatch_size = minibatch_size
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
52 # the discount factor in the moving estimate
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
53 self.gamma = gamma
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
54 # regularizer of the covariance estimate
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
55 self.regularizer = regularizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
56 # wether we center the observations or not: obtain leading eigen of
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
57 # covariance (centering = True) vs second moment (centering = False)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
58 self.centering = centering
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
59
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
60 # Total number of observations: to compute the normalizer for the mean and
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
61 # the covariance.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
62 self.n_observations = 0
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
63 # Index in the current minibatch
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
64 self.minibatch_index = 0
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
65
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
66 # Matrix containing on its *rows*:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
67 # - the current unnormalized eigen vector estimates
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
68 # - the observations since the last reevaluation
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
69 self.Xt = numpy.zeros([self.n_eigen + self.minibatch_size, self.n_dim])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
70
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
71 # The discounted sum of the observations.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
72 self.x_sum = numpy.zeros([self.n_dim])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
73
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
74 # The Gram matrix of the observations, ie Xt Xt' (since Xt is rowwise)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
75 self.G = numpy.zeros([self.n_eigen + self.minibatch_size, self.n_eigen + self.minibatch_size])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
76 for i in range(self.n_eigen):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
77 self.G[i,i] = self.regularizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
78
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
79 # I don't think it's worth "allocating" these 3 next (though they need to be
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
80 # declared). I don't know how to do in place operations...
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
81
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
82 # Hold the results of the eigendecomposition of the Gram matrix G
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
83 # (eigen vectors on columns of V).
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
84 self.d = numpy.zeros([self.n_eigen + self.minibatch_size])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
85 self.V = numpy.zeros([self.n_eigen + self.minibatch_size, self.n_eigen + self.minibatch_size])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
86
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
87 # Holds the unnormalized eigenvectors of the covariance matrix before
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
88 # they're copied back to Xt.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
89 self.Ut = numpy.zeros([self.n_eigen, self.n_dim])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
90
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
91
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
92 def observe(self, x):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
93 assert(numpy.size(x) == self.n_dim)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
94
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
95 self.n_observations += 1
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
96
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
97 # Add the *non-centered* observation to Xt.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
98 row = self.n_eigen + self.minibatch_index
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
99 self.Xt[row] = x
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
100
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
101 # Update the discounted sum of the observations.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
102 self.x_sum *= self.gamma
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
103 self.x_sum += x
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
104
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
105 # To get the mean, we must normalize the sum by:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
106 # /gamma^(n_observations-1) + /gamma^(n_observations-2) + ... + 1
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
107 normalizer = (1.0 - pow(self.gamma, self.n_observations)) /(1.0 - self.gamma);
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
108 #print "normalizer: ", normalizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
109
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
110 # Now center the observation.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
111 # We will lose the first observation as it is the only one in the mean.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
112 if self.centering:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
113 self.Xt[row] -= self.x_sum / normalizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
114
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
115 # Multiply the observation by the discount compensator. Basically
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
116 # we make this observation look "younger" than the previous ones. The actual
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
117 # discount is applied in the reevaluation (and when solving the equations in
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
118 # the case of TONGA) by multiplying every direction with the same aging factor.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
119 rn = pow(self.gamma, -0.5*(self.minibatch_index+1));
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
120 self.Xt[row] *= rn
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
121
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
122 # Update the Gram Matrix.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
123 # The column.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
124 self.G[:row+1,row] = numpy.dot( self.Xt[:row+1,:], self.Xt[row,:].transpose() )
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
125 # The symetric row.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
126 # There are row+1 values, but the diag doesn't need to get copied.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
127 self.G[row,:row] = self.G[:row,row].transpose()
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
128
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
129 self.minibatch_index += 1
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
130
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
131 if self.minibatch_index == self.minibatch_size:
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
132 self.reevaluate()
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
133
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
134
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
135 def reevaluate(self):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
136 # TODO do the modifications to handle when this is not true.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
137 assert(self.minibatch_index == self.minibatch_size);
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
138
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
139 # Regularize - not necessary but in case
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
140 for i in range(self.n_eigen + self.minibatch_size):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
141 self.G[i,i] += self.regularizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
142
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
143 # The Gram matrix is up to date. Get its low rank eigendecomposition.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
144 # NOTE: the eigenvalues are in ASCENDING order and the vectors are on
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
145 # the columns.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
146 # With scipy 0.7, you can ask for only some eigenvalues (the n_eigen top
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
147 # ones) but it doesn't look loke it for scipy 0.6.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
148 self.d, self.V = linalg.eigh(self.G) #, overwrite_a=True)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
149
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
150 # Convert the n_eigen LAST eigenvectors of the Gram matrix contained in V
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
151 # into *unnormalized* eigenvectors U of the covariance (unnormalized wrt
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
152 # the eigen values, not the moving average).
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
153 self.Ut = numpy.dot(self.V[:,-self.n_eigen:].transpose(), self.Xt)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
154
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
155 # Take into account the discount factor.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
156 # Here, minibatch index is minibatch_size. We age everyone. Because of the
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
157 # previous multiplications to make some observations "younger" we multiply
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
158 # everyone by the same factor.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
159 # TODO VERIFY THIS!
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
160 rn = pow(self.gamma, -0.5*(self.minibatch_index+1))
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
161 inv_rn2 = 1.0/(rn*rn)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
162 self.Ut *= 1.0/rn
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
163 self.d *= inv_rn2;
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
164
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
165 #print "*** Reevaluate! ***"
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
166 #normalizer = (1.0 - pow(self.gamma, self.n_observations)) /(1.0 - self.gamma)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
167 #print "normalizer: ", normalizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
168 #print self.d / normalizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
169 #print self.Ut # unnormalized eigen vectors (wrt eigenvalues AND moving average).
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
170
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
171 # Update Xt, G and minibatch_index
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
172 self.Xt[:self.n_eigen,:] = self.Ut
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
173
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
174 for i in range(self.n_eigen):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
175 self.G[i,i] = self.d[-self.n_eigen+i]
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
176
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
177 self.minibatch_index = 0
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
178
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
179 # Returns a copy of the current estimate of the eigen values and vectors
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
180 # (normalized vectors on rows), normalized by the discounted number of observations.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
181 def getLeadingEigen(self):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
182 # We subtract self.minibatch_index in case this call is not right after a reevaluate call.
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
183 normalizer = (1.0 - pow(self.gamma, self.n_observations - self.minibatch_index)) /(1.0 - self.gamma)
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
184
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
185 eigvals = self.d[-self.n_eigen:] / normalizer
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
186 eigvecs = numpy.zeros([self.n_eigen, self.n_dim])
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
187 for i in range(self.n_eigen):
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
188 eigvecs[i] = self.Ut[-self.n_eigen+i] / numpy.sqrt(numpy.dot(self.Ut[-self.n_eigen+i], self.Ut[-self.n_eigen+i]))
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
189
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
190 return [eigvals, eigvecs]
cedb48a300fc added a pca online estimator
Philippe Hamel <higgsbosonh@hotmail.com>
parents:
diff changeset
191