Mercurial > ift6266

--- a/writeup/techreport.tex	Wed Apr 28 16:39:10 2010 -0400
+++ b/writeup/techreport.tex	Thu Apr 29 12:55:57 2010 -0400
@@ -136,8 +136,31 @@

 \subsubsection{Multi-Layer Perceptrons (MLP)}

+An MLP is a family of functions that are described by stacking layers of of a function similar to
+$$g(x) = \tanh(b+Wx)$$
+The input, $x$, is a $d$-dimension vector.
+The output, $g(x)$, is a $m$-dimension vector.
+The parameter $W$  is a $m\times d$ matrix and $b$ is a $m$-vector.
+The non-linearity (here $\tanh$) is applied element-wise to the output vector.
+Usually the input is referred to a input layer and similarly for the output.
+You can of course chain several such functions to obtain a more complex one.
+Here is a common example
+$$f(x) = c + V\tanh(b+Wx)$$
+In this case the intermediate layer corresponding to $\tanh(b+Wx)$ is called a hidden layer.
+Here the output layer does not have the same non-linearity as the hidden layer.
+This is a common case where some specialized non-linearity is applied to the output layer only depending on the task at hand.
+
+If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP.
+
 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}

+Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization.
+Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it.
+The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform.
+
+The stacked version is an adaptation to deep MLPs where you initialize each layer with a denoising auto-encoder  starting from the bottom.
+For additional details see \cite{vincent:icml08}.
+
 \section{Experimental Results}

 \subsection{SDA vs MLP}