Mercurial > ift6266
diff writeup/aistats2011_cameraready.tex @ 637:fe98896745a5
fitting
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 19 Mar 2011 23:07:03 -0400 |
parents | 83d53ffe3f25 |
children | 677d1b1d8158 |
line wrap: on
line diff
--- a/writeup/aistats2011_cameraready.tex Sat Mar 19 23:01:46 2011 -0400 +++ b/writeup/aistats2011_cameraready.tex Sat Mar 19 23:07:03 2011 -0400 @@ -525,13 +525,14 @@ \[ z={\rm sigm}(d+V' y) \] -but using the transpose of the encoder weights. -We minimize the training +using the transpose of encoder weights. +The training set average of the cross-entropy -reconstruction error +reconstruction loss \[ - L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i). + L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i) \] +is minimized. Here we use the random binary masking corruption (which in $\tilde{x}$ sets to 0 a random subset of the elements of $x$, and copies the rest). @@ -558,13 +559,13 @@ of hidden layers but it was fixed to 3 for our experiments, based on previous work with SDAs on MNIST~\citep{VincentPLarochelleH2008-very-small}. -We also compared against 1 and against 2 hidden layers, in order -to disantangle the effect of depth from the effect of unsupervised +We also compared against 1 and against 2 hidden layers, +to disantangle the effect of depth from that of unsupervised pre-training. -The size of the hidden -layers was kept constant across hidden layers, and the best results -were obtained with the largest values that we could experiment -with given our patience, with 1000 hidden units. +The size of each hidden +layer was kept constant across hidden layers, and the best results +were obtained with the largest values that we tried +(1000 hidden units). %\vspace*{-1mm}