comparison writeup/techreport.tex @ 422:e7790db265b1

Basic text for section 3, add a bit more detail to section 4.2.2
author Arnaud Bergeron <abergeron@gmail.com>
date Fri, 30 Apr 2010 16:24:30 -0400
parents 0282882aa91f
children e4eb3ee7a0cf
comparison
equal deleted inserted replaced
419:c91d7b67fa41 422:e7790db265b1
174 \label{fig:pipeline} 174 \label{fig:pipeline}
175 \end{figure} 175 \end{figure}
176 176
177 \section{Learning Algorithms for Deep Architectures} 177 \section{Learning Algorithms for Deep Architectures}
178 178
179 Learning for deep network has long been a problem since well-known learning algorithms do not generalize well on deep architectures.
180 Using these training algorithms on deep network usually yields to a worse generalization than on shallow networks.
181 Recently, new initialization techniques have been discovered that enable better generalization overall.
182
183 One of these initialization techniques is denoising auto-encoders.
184 The principle is that each layer starting from the bottom is trained to encode and decode their input and the encoding part is kept as initialization for the weights and bias of the network.
185 For more details see section \ref{SdA}.
186
187 After initialization is done, standard training algorithms work.
188 In this case, since we have large data sets we use stochastic gradient descent.
189 This resemble minibatch training except that the batches are selected at random.
190 To speed up computation, we randomly pre-arranged examples in batches and used those for all training experiments.
191
179 \section{Experimental Setup} 192 \section{Experimental Setup}
180 193
181 \subsection{Training Datasets} 194 \subsection{Training Datasets}
182 195
183 \subsubsection{Data Sources} 196 \subsubsection{Data Sources}
217 230
218 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP. 231 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP.
219 The parameters to adapt are the weight matrix and the bias vector for each layer. 232 The parameters to adapt are the weight matrix and the bias vector for each layer.
220 233
221 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} 234 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
235 \label{SdA}
222 236
223 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization. 237 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization.
224 Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it. 238 This is essentially unsupervised training where the layer is made to reconstruct its input through and encoding and decoding phase.
239 Denoising auto-encoders are a variant where the input is corrupted with random noise but the target is the uncorrupted input.
225 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform. 240 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform.
226 241
227 An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer. 242 An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer.
228 Usually the top and bottom weight matrices are the transpose of each other and are fixed this way. 243 Usually the top and bottom weight matrices are the transpose of each other and are fixed this way.
229 The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer. 244 The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer.