changeset 411:4f69d915d142

Better description of the model parameters.
author Arnaud Bergeron <abergeron@gmail.com>
date Thu, 29 Apr 2010 13:18:15 -0400
parents 6330298791fb
children 6478eef4f8aa
files writeup/techreport.tex
diffstat 1 files changed, 10 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- a/writeup/techreport.tex	Thu Apr 29 12:55:57 2010 -0400
+++ b/writeup/techreport.tex	Thu Apr 29 13:18:15 2010 -0400
@@ -140,7 +140,8 @@
 $$g(x) = \tanh(b+Wx)$$
 The input, $x$, is a $d$-dimension vector.  
 The output, $g(x)$, is a $m$-dimension vector.
-The parameter $W$  is a $m\times d$ matrix and $b$ is a $m$-vector.
+The parameter $W$ is a $m\times d$ matrix and is called the weight matrix.
+The parameter  $b$ is a $m$-vector and is called the bias vector.
 The non-linearity (here $\tanh$) is applied element-wise to the output vector.
 Usually the input is referred to a input layer and similarly for the output.
 You can of course chain several such functions to obtain a more complex one.
@@ -151,6 +152,7 @@
 This is a common case where some specialized non-linearity is applied to the output layer only depending on the task at hand.
 
 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP.
+The parameters to adapt are the weight matrix and the bias vector for each layer.
 
 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
 
@@ -158,7 +160,14 @@
 Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it.
 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform.
 
+An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer.
+Usually the top and bottom weight matrices are the transpose of each other and are fixed this way.
+The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer.
+The other parameters are discarded.
+
 The stacked version is an adaptation to deep MLPs where you initialize each layer with a denoising auto-encoder  starting from the bottom.
+During the initialization, which is usually called pre-training, the bottom layer is treated as if it were an isolated auto-encoder.
+The second and following layers receive the same treatment except that they take as input the encoded version of the data that has gone through the layers before it.
 For additional details see \cite{vincent:icml08}.
 
 \section{Experimental Results}