# HG changeset patch # User Yoshua Bengio # Date 1300590423 14400 # Node ID fe98896745a55952011d6cfb35116da4b202a1a3 # Parent 83d53ffe3f2541f9b51b08c716c3c57c671b2d78 fitting diff -r 83d53ffe3f25 -r fe98896745a5 writeup/aistats2011_cameraready.tex --- a/writeup/aistats2011_cameraready.tex Sat Mar 19 23:01:46 2011 -0400 +++ b/writeup/aistats2011_cameraready.tex Sat Mar 19 23:07:03 2011 -0400 @@ -525,13 +525,14 @@ \[ z={\rm sigm}(d+V' y) \] -but using the transpose of the encoder weights. -We minimize the training +using the transpose of encoder weights. +The training set average of the cross-entropy -reconstruction error +reconstruction loss \[ - L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i). + L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i) \] +is minimized. Here we use the random binary masking corruption (which in $\tilde{x}$ sets to 0 a random subset of the elements of $x$, and copies the rest). @@ -558,13 +559,13 @@ of hidden layers but it was fixed to 3 for our experiments, based on previous work with SDAs on MNIST~\citep{VincentPLarochelleH2008-very-small}. -We also compared against 1 and against 2 hidden layers, in order -to disantangle the effect of depth from the effect of unsupervised +We also compared against 1 and against 2 hidden layers, +to disantangle the effect of depth from that of unsupervised pre-training. -The size of the hidden -layers was kept constant across hidden layers, and the best results -were obtained with the largest values that we could experiment -with given our patience, with 1000 hidden units. +The size of each hidden +layer was kept constant across hidden layers, and the best results +were obtained with the largest values that we tried +(1000 hidden units). %\vspace*{-1mm}