# HG changeset patch
# User Yoshua Bengio <bengioy@iro.umontreal.ca>
# Date 1300590423 14400
# Node ID fe98896745a55952011d6cfb35116da4b202a1a3
# Parent  83d53ffe3f2541f9b51b08c716c3c57c671b2d78
fitting

diff -r 83d53ffe3f25 -r fe98896745a5 writeup/aistats2011_cameraready.tex
--- a/writeup/aistats2011_cameraready.tex	Sat Mar 19 23:01:46 2011 -0400
+++ b/writeup/aistats2011_cameraready.tex	Sat Mar 19 23:07:03 2011 -0400
@@ -525,13 +525,14 @@
 \[ 
  z={\rm sigm}(d+V' y)
 \]
-but using the transpose of the encoder weights.
-We minimize the training
+using the transpose of encoder weights.
+The training
 set average of the cross-entropy
-reconstruction error 
+reconstruction loss 
 \[
- L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i).
+ L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i)
 \]
+is minimized.
 Here we use the random binary masking corruption
 (which in $\tilde{x}$ sets to 0 a random subset of the elements of $x$, and
 copies the rest).
@@ -558,13 +559,13 @@
 of hidden layers but it was fixed to 3 for our experiments,
 based on previous work with
 SDAs on MNIST~\citep{VincentPLarochelleH2008-very-small}. 
-We also compared against 1 and against 2 hidden layers, in order
-to disantangle the effect of depth from the effect of unsupervised
+We also compared against 1 and against 2 hidden layers, 
+to disantangle the effect of depth from that of unsupervised
 pre-training.
-The size of the hidden
-layers was kept constant across hidden layers, and the best results
-were obtained with the largest values that we could experiment
-with given our patience, with 1000 hidden units.
+The size of each hidden
+layer was kept constant across hidden layers, and the best results
+were obtained with the largest values that we tried
+(1000 hidden units).
 
 %\vspace*{-1mm}