Mercurial > ift6266
diff writeup/aistats2011_cameraready.tex @ 636:83d53ffe3f25
eqns
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 19 Mar 2011 23:01:46 -0400 |
parents | 54e8958e963b |
children | fe98896745a5 |
line wrap: on
line diff
--- a/writeup/aistats2011_cameraready.tex Sat Mar 19 22:58:06 2011 -0400 +++ b/writeup/aistats2011_cameraready.tex Sat Mar 19 23:01:46 2011 -0400 @@ -431,13 +431,9 @@ hidden layer) and deep SDAs. \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} -{\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated +{\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated with \[ -P({\rm class}|{\rm input}=x) -\] -with -\[ -f(x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), +P({\rm class}|{\rm input}=x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), \] i.e., two layers, where \[ @@ -519,15 +515,17 @@ of the uncorrupted input $x$. Because the network has to denoise, it is forcing the hidden units $y$ to represent the leading regularities in the data. Following~\citep{VincentPLarochelleH2008-very-small} -the hidden units output $y$ is obtained through +the hidden units output $y$ is obtained through the sigmoid-affine +encoder \[ y={\rm sigm}(c+V x) \] where ${\rm sigm}(a)=1/(1+\exp(-a))$ -and the reconstruction is +and the reconstruction is obtained through the same transformation \[ - z={\rm sigm}(d+V' y). + z={\rm sigm}(d+V' y) \] +but using the transpose of the encoder weights. We minimize the training set average of the cross-entropy reconstruction error