# HG changeset patch
# User Yoshua Bengio <bengioy@iro.umontreal.ca>
# Date 1300590106 14400
# Node ID 83d53ffe3f2541f9b51b08c716c3c57c671b2d78
# Parent  d2d7ce0f09420f756c2c08e32bb4fcb16ff4ac70
eqns

diff -r d2d7ce0f0942 -r 83d53ffe3f25 writeup/aistats2011_cameraready.tex
--- a/writeup/aistats2011_cameraready.tex	Sat Mar 19 22:58:06 2011 -0400
+++ b/writeup/aistats2011_cameraready.tex	Sat Mar 19 23:01:46 2011 -0400
@@ -431,13 +431,9 @@
 hidden layer) and deep SDAs.
 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.}
 
-{\bf Multi-Layer Perceptrons (MLP).}  The MLP output estimated 
+{\bf Multi-Layer Perceptrons (MLP).}  The MLP output estimated with
 \[
-P({\rm class}|{\rm input}=x)
-\]
-with 
-\[
-f(x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)),
+P({\rm class}|{\rm input}=x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)),
 \] 
 i.e., two layers, where 
 \[
@@ -519,15 +515,17 @@
 of the uncorrupted input $x$. Because the network has to denoise, it is
 forcing the hidden units $y$ to represent the leading regularities in
 the data. Following~\citep{VincentPLarochelleH2008-very-small} 
-the hidden units output $y$ is obtained through 
+the hidden units output $y$ is obtained through the sigmoid-affine
+encoder
 \[
  y={\rm sigm}(c+V x)
 \]
 where ${\rm sigm}(a)=1/(1+\exp(-a))$
-and the reconstruction is 
+and the reconstruction is obtained through the same transformation
 \[ 
- z={\rm sigm}(d+V' y).
+ z={\rm sigm}(d+V' y)
 \]
+but using the transpose of the encoder weights.
 We minimize the training
 set average of the cross-entropy
 reconstruction error