ift6266: deep/stacked_dae/v2/stacked

Corrigé une erreur dans la formule de coût modifiée dans stacked_dae, et enlevé des timers dans sgd_optimization

comparison

equal deleted inserted replaced

-:acae439d6572
+:851e7ad4a143
 # Equation (2)
 # note  : y is stored as an attribute of the class so that it can be
 #         used later when stacking dAs.
 self.y   = T.nnet.sigmoid(T.dot(self.tilde_x, self.W      ) + self.b)
 # Equation (3)
-self.z   = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
+#self.z   = T.nnet.sigmoid(T.dot(self.y, self.W_prime) + self.b_prime)
 # Equation (4)
 # note : we sum over the size of a datapoint; if we are using minibatches,
 #        L will  be a vector, with one entry per example in minibatch
 #self.L = - T.sum( self.x*T.log(self.z) + (1-self.x)*T.log(1-self.z), axis=1 )
 #self.L = binary_cross_entropy(target=self.x, output=self.z, sum_axis=1)
 # bypassing z to avoid running to log(0)
 z_a = T.dot(self.y, self.W_prime) + self.b_prime
-log_sigmoid = T.log(1) - T.log(1+T.exp(-z_a))
+log_sigmoid = T.log(1.) - T.log(1.+T.exp(-z_a))
 # log(1-sigmoid(z_a))
-log_1_sigmoid = -self.x - T.log(1+T.exp(-z_a))
+log_1_sigmoid = -self.z_a - T.log(1.+T.exp(-z_a))
 self.L = -T.sum( self.x * (log_sigmoid) \
 + (1.0-self.x) * (log_1_sigmoid), axis=1 )
 # I added this epsilon to avoid getting log(0) and 1/0 in grad
 # This means conceptually that there'd be no probability of 0, but that

Mercurial > ift6266