comparison writeup/aistats2011_cameraready.tex @ 636:83d53ffe3f25

eqns
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sat, 19 Mar 2011 23:01:46 -0400
parents 54e8958e963b
children fe98896745a5
comparison
equal deleted inserted replaced
635:d2d7ce0f0942 636:83d53ffe3f25
429 429
430 The experiments are performed using MLPs (with a single 430 The experiments are performed using MLPs (with a single
431 hidden layer) and deep SDAs. 431 hidden layer) and deep SDAs.
432 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} 432 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.}
433 433
434 {\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated 434 {\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated with
435 \[ 435 \[
436 P({\rm class}|{\rm input}=x) 436 P({\rm class}|{\rm input}=x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)),
437 \]
438 with
439 \[
440 f(x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)),
441 \] 437 \]
442 i.e., two layers, where 438 i.e., two layers, where
443 \[ 439 \[
444 p={\rm softmax}(a) 440 p={\rm softmax}(a)
445 \] 441 \]
517 Auto-encoder is presented with a stochastically corrupted version $\tilde{x}$ 513 Auto-encoder is presented with a stochastically corrupted version $\tilde{x}$
518 of the input $x$ and trained to reconstruct to produce a reconstruction $z$ 514 of the input $x$ and trained to reconstruct to produce a reconstruction $z$
519 of the uncorrupted input $x$. Because the network has to denoise, it is 515 of the uncorrupted input $x$. Because the network has to denoise, it is
520 forcing the hidden units $y$ to represent the leading regularities in 516 forcing the hidden units $y$ to represent the leading regularities in
521 the data. Following~\citep{VincentPLarochelleH2008-very-small} 517 the data. Following~\citep{VincentPLarochelleH2008-very-small}
522 the hidden units output $y$ is obtained through 518 the hidden units output $y$ is obtained through the sigmoid-affine
519 encoder
523 \[ 520 \[
524 y={\rm sigm}(c+V x) 521 y={\rm sigm}(c+V x)
525 \] 522 \]
526 where ${\rm sigm}(a)=1/(1+\exp(-a))$ 523 where ${\rm sigm}(a)=1/(1+\exp(-a))$
527 and the reconstruction is 524 and the reconstruction is obtained through the same transformation
528 \[ 525 \[
529 z={\rm sigm}(d+V' y). 526 z={\rm sigm}(d+V' y)
530 \] 527 \]
528 but using the transpose of the encoder weights.
531 We minimize the training 529 We minimize the training
532 set average of the cross-entropy 530 set average of the cross-entropy
533 reconstruction error 531 reconstruction error
534 \[ 532 \[
535 L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i). 533 L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i).