# HG changeset patch # User Yoshua Bengio # Date 1300590106 14400 # Node ID 83d53ffe3f2541f9b51b08c716c3c57c671b2d78 # Parent d2d7ce0f09420f756c2c08e32bb4fcb16ff4ac70 eqns diff -r d2d7ce0f0942 -r 83d53ffe3f25 writeup/aistats2011_cameraready.tex --- a/writeup/aistats2011_cameraready.tex Sat Mar 19 22:58:06 2011 -0400 +++ b/writeup/aistats2011_cameraready.tex Sat Mar 19 23:01:46 2011 -0400 @@ -431,13 +431,9 @@ hidden layer) and deep SDAs. \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} -{\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated +{\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated with \[ -P({\rm class}|{\rm input}=x) -\] -with -\[ -f(x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), +P({\rm class}|{\rm input}=x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), \] i.e., two layers, where \[ @@ -519,15 +515,17 @@ of the uncorrupted input $x$. Because the network has to denoise, it is forcing the hidden units $y$ to represent the leading regularities in the data. Following~\citep{VincentPLarochelleH2008-very-small} -the hidden units output $y$ is obtained through +the hidden units output $y$ is obtained through the sigmoid-affine +encoder \[ y={\rm sigm}(c+V x) \] where ${\rm sigm}(a)=1/(1+\exp(-a))$ -and the reconstruction is +and the reconstruction is obtained through the same transformation \[ - z={\rm sigm}(d+V' y). + z={\rm sigm}(d+V' y) \] +but using the transpose of the encoder weights. We minimize the training set average of the cross-entropy reconstruction error