Mercurial > ift6266
comparison writeup/techreport.tex @ 410:6330298791fb
Description brève de MLP et SdA
author | Arnaud Bergeron <abergeron@gmail.com> |
---|---|
date | Thu, 29 Apr 2010 12:55:57 -0400 |
parents | fe2e2964e7a3 |
children | 4f69d915d142 |
comparison
equal
deleted
inserted
replaced
409:f0c2e3cfb1f1 | 410:6330298791fb |
---|---|
134 | 134 |
135 \subsection{Models and their Hyperparameters} | 135 \subsection{Models and their Hyperparameters} |
136 | 136 |
137 \subsubsection{Multi-Layer Perceptrons (MLP)} | 137 \subsubsection{Multi-Layer Perceptrons (MLP)} |
138 | 138 |
139 An MLP is a family of functions that are described by stacking layers of of a function similar to | |
140 $$g(x) = \tanh(b+Wx)$$ | |
141 The input, $x$, is a $d$-dimension vector. | |
142 The output, $g(x)$, is a $m$-dimension vector. | |
143 The parameter $W$ is a $m\times d$ matrix and $b$ is a $m$-vector. | |
144 The non-linearity (here $\tanh$) is applied element-wise to the output vector. | |
145 Usually the input is referred to a input layer and similarly for the output. | |
146 You can of course chain several such functions to obtain a more complex one. | |
147 Here is a common example | |
148 $$f(x) = c + V\tanh(b+Wx)$$ | |
149 In this case the intermediate layer corresponding to $\tanh(b+Wx)$ is called a hidden layer. | |
150 Here the output layer does not have the same non-linearity as the hidden layer. | |
151 This is a common case where some specialized non-linearity is applied to the output layer only depending on the task at hand. | |
152 | |
153 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP. | |
154 | |
139 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} | 155 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} |
156 | |
157 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization. | |
158 Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it. | |
159 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform. | |
160 | |
161 The stacked version is an adaptation to deep MLPs where you initialize each layer with a denoising auto-encoder starting from the bottom. | |
162 For additional details see \cite{vincent:icml08}. | |
140 | 163 |
141 \section{Experimental Results} | 164 \section{Experimental Results} |
142 | 165 |
143 \subsection{SDA vs MLP} | 166 \subsection{SDA vs MLP} |
144 | 167 |