comparison writeup/nips2010_submission.tex @ 521:13816dbef6ed

des choses ont disparu
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 15:48:46 -0400
parents 18a6379999fd
children d41926a68993
comparison
equal deleted inserted replaced
520:18a6379999fd 521:13816dbef6ed
204 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times 204 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
205 \sqrt[3]{complexity}$.\\ 205 \sqrt[3]{complexity}$.\\
206 {\bf Pinch.} 206 {\bf Pinch.}
207 This is a GIMP filter called ``Whirl and 207 This is a GIMP filter called ``Whirl and
208 pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic 208 pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic
209 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. 209 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
210 For a square input image, this is akin to drawing a circle of 210 For a square input image, this is akin to drawing a circle of
211 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 211 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
212 that disk (region inside circle) will have its value recalculated by taking 212 that disk (region inside circle) will have its value recalculated by taking
213 the value of another ``source'' pixel in the original image. The position of 213 the value of another ``source'' pixel in the original image. The position of
214 that source pixel is found on the line that goes through $C$ and $P$, but 214 that source pixel is found on the line that goes through $C$ and $P$, but
452 examples are presented in minibatches of size 20, a constant learning 452 examples are presented in minibatches of size 20, a constant learning
453 rate is chosen in $\{10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ 453 rate is chosen in $\{10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
454 through preliminary experiments (measuring performance on a validation set), 454 through preliminary experiments (measuring performance on a validation set),
455 and $0.1$ was then selected. 455 and $0.1$ was then selected.
456 456
457 \begin{figure}[h]
458 \resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}
459 \caption{Illustration of the computations and training criterion for the denoising
460 auto-encoder used to pre-train each layer of the deep architecture. Input $x$
461 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$.
462 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which
463 is compared to the uncorrupted input $x$ through the loss function
464 $L_H(x,z)$, whose expected value is approximately minimized during training
465 by tuning $\theta$ and $\theta'$.}
466 \label{fig:da}
467 \end{figure}
468
457 {\bf Stacked Denoising Auto-Encoders (SDA).} 469 {\bf Stacked Denoising Auto-Encoders (SDA).}
458 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) 470 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs)
459 can be used to initialize the weights of each layer of a deep MLP (with many hidden 471 can be used to initialize the weights of each layer of a deep MLP (with many hidden
460 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006}, 472 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006},
461 apparently setting parameters in the 473 apparently setting parameters in the
468 taking advantage of the expressive power and bias implicit in the 480 taking advantage of the expressive power and bias implicit in the
469 deep architecture (whereby complex concepts are expressed as 481 deep architecture (whereby complex concepts are expressed as
470 compositions of simpler ones through a deep hierarchy). 482 compositions of simpler ones through a deep hierarchy).
471 Here we chose to use the Denoising 483 Here we chose to use the Denoising
472 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for 484 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for
473 % AJOUTER UNE IMAGE?
474 these deep hierarchies of features, as it is very simple to train and 485 these deep hierarchies of features, as it is very simple to train and
475 teach (see tutorial and code there: {\tt http://deeplearning.net/tutorial}), 486 teach (see Figure~\ref{fig:da}, as well as
487 tutorial and code there: {\tt http://deeplearning.net/tutorial}),
476 provides immediate and efficient inference, and yielded results 488 provides immediate and efficient inference, and yielded results
477 comparable or better than RBMs in series of experiments 489 comparable or better than RBMs in series of experiments
478 \citep{VincentPLarochelleH2008}. During training, a Denoising 490 \citep{VincentPLarochelleH2008}. During training, a Denoising
479 Auto-Encoder is presented with a stochastically corrupted version 491 Auto-Encoder is presented with a stochastically corrupted version
480 of the input and trained to reconstruct the uncorrupted input, 492 of the input and trained to reconstruct the uncorrupted input,