comparison writeup/techreport.tex @ 410:6330298791fb

Description brève de MLP et SdA
author Arnaud Bergeron <abergeron@gmail.com>
date Thu, 29 Apr 2010 12:55:57 -0400
parents fe2e2964e7a3
children 4f69d915d142
comparison
equal deleted inserted replaced
409:f0c2e3cfb1f1 410:6330298791fb
134 134
135 \subsection{Models and their Hyperparameters} 135 \subsection{Models and their Hyperparameters}
136 136
137 \subsubsection{Multi-Layer Perceptrons (MLP)} 137 \subsubsection{Multi-Layer Perceptrons (MLP)}
138 138
139 An MLP is a family of functions that are described by stacking layers of of a function similar to
140 $$g(x) = \tanh(b+Wx)$$
141 The input, $x$, is a $d$-dimension vector.
142 The output, $g(x)$, is a $m$-dimension vector.
143 The parameter $W$ is a $m\times d$ matrix and $b$ is a $m$-vector.
144 The non-linearity (here $\tanh$) is applied element-wise to the output vector.
145 Usually the input is referred to a input layer and similarly for the output.
146 You can of course chain several such functions to obtain a more complex one.
147 Here is a common example
148 $$f(x) = c + V\tanh(b+Wx)$$
149 In this case the intermediate layer corresponding to $\tanh(b+Wx)$ is called a hidden layer.
150 Here the output layer does not have the same non-linearity as the hidden layer.
151 This is a common case where some specialized non-linearity is applied to the output layer only depending on the task at hand.
152
153 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP.
154
139 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} 155 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
156
157 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization.
158 Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it.
159 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform.
160
161 The stacked version is an adaptation to deep MLPs where you initialize each layer with a denoising auto-encoder starting from the bottom.
162 For additional details see \cite{vincent:icml08}.
140 163
141 \section{Experimental Results} 164 \section{Experimental Results}
142 165
143 \subsection{SDA vs MLP} 166 \subsection{SDA vs MLP}
144 167