Mercurial > ift6266
comparison writeup/aistats2011_cameraready.tex @ 636:83d53ffe3f25
eqns
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 19 Mar 2011 23:01:46 -0400 |
parents | 54e8958e963b |
children | fe98896745a5 |
comparison
equal
deleted
inserted
replaced
635:d2d7ce0f0942 | 636:83d53ffe3f25 |
---|---|
429 | 429 |
430 The experiments are performed using MLPs (with a single | 430 The experiments are performed using MLPs (with a single |
431 hidden layer) and deep SDAs. | 431 hidden layer) and deep SDAs. |
432 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} | 432 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} |
433 | 433 |
434 {\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated | 434 {\bf Multi-Layer Perceptrons (MLP).} The MLP output estimated with |
435 \[ | 435 \[ |
436 P({\rm class}|{\rm input}=x) | 436 P({\rm class}|{\rm input}=x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), |
437 \] | |
438 with | |
439 \[ | |
440 f(x)={\rm softmax}(b_2+W_2\tanh(b_1+W_1 x)), | |
441 \] | 437 \] |
442 i.e., two layers, where | 438 i.e., two layers, where |
443 \[ | 439 \[ |
444 p={\rm softmax}(a) | 440 p={\rm softmax}(a) |
445 \] | 441 \] |
517 Auto-encoder is presented with a stochastically corrupted version $\tilde{x}$ | 513 Auto-encoder is presented with a stochastically corrupted version $\tilde{x}$ |
518 of the input $x$ and trained to reconstruct to produce a reconstruction $z$ | 514 of the input $x$ and trained to reconstruct to produce a reconstruction $z$ |
519 of the uncorrupted input $x$. Because the network has to denoise, it is | 515 of the uncorrupted input $x$. Because the network has to denoise, it is |
520 forcing the hidden units $y$ to represent the leading regularities in | 516 forcing the hidden units $y$ to represent the leading regularities in |
521 the data. Following~\citep{VincentPLarochelleH2008-very-small} | 517 the data. Following~\citep{VincentPLarochelleH2008-very-small} |
522 the hidden units output $y$ is obtained through | 518 the hidden units output $y$ is obtained through the sigmoid-affine |
519 encoder | |
523 \[ | 520 \[ |
524 y={\rm sigm}(c+V x) | 521 y={\rm sigm}(c+V x) |
525 \] | 522 \] |
526 where ${\rm sigm}(a)=1/(1+\exp(-a))$ | 523 where ${\rm sigm}(a)=1/(1+\exp(-a))$ |
527 and the reconstruction is | 524 and the reconstruction is obtained through the same transformation |
528 \[ | 525 \[ |
529 z={\rm sigm}(d+V' y). | 526 z={\rm sigm}(d+V' y) |
530 \] | 527 \] |
528 but using the transpose of the encoder weights. | |
531 We minimize the training | 529 We minimize the training |
532 set average of the cross-entropy | 530 set average of the cross-entropy |
533 reconstruction error | 531 reconstruction error |
534 \[ | 532 \[ |
535 L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i). | 533 L_H(x,z)=\sum_i z_i \log x_i + (1-z_i) \log(1-x_i). |