comparison writeup/techreport.tex @ 427:ace489930918

merge
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Fri, 30 Apr 2010 16:29:43 -0400
parents a7fab59de174 6e5f0f50ddab
children 9fcd0215b8d5
comparison
equal deleted inserted replaced
426:a7fab59de174 427:ace489930918
212 \label{fig:pipeline} 212 \label{fig:pipeline}
213 \end{figure} 213 \end{figure}
214 214
215 \section{Learning Algorithms for Deep Architectures} 215 \section{Learning Algorithms for Deep Architectures}
216 216
217 Learning for deep network has long been a problem since well-known learning algorithms do not generalize well on deep architectures.
218 Using these training algorithms on deep network usually yields to a worse generalization than on shallow networks.
219 Recently, new initialization techniques have been discovered that enable better generalization overall.
220
221 One of these initialization techniques is denoising auto-encoders.
222 The principle is that each layer starting from the bottom is trained to encode and decode their input and the encoding part is kept as initialization for the weights and bias of the network.
223 For more details see section \ref{SdA}.
224
225 After initialization is done, standard training algorithms work.
226 In this case, since we have large data sets we use stochastic gradient descent.
227 This resemble minibatch training except that the batches are selected at random.
228 To speed up computation, we randomly pre-arranged examples in batches and used those for all training experiments.
229
217 \section{Experimental Setup} 230 \section{Experimental Setup}
218 231
219 \subsection{Training Datasets} 232 \subsection{Training Datasets}
220 233
221 \subsubsection{Data Sources} 234 \subsubsection{Data Sources}
261 274
262 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP. 275 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP.
263 The parameters to adapt are the weight matrix and the bias vector for each layer. 276 The parameters to adapt are the weight matrix and the bias vector for each layer.
264 277
265 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} 278 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
279 \label{SdA}
266 280
267 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization. 281 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization.
268 Denoising auto-encoders are a variant where the input is corrupted with random noise before trying to repair it. 282 This is essentially unsupervised training where the layer is made to reconstruct its input through and encoding and decoding phase.
283 Denoising auto-encoders are a variant where the input is corrupted with random noise but the target is the uncorrupted input.
269 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform. 284 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform.
270 285
271 An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer. 286 An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer.
272 Usually the top and bottom weight matrices are the transpose of each other and are fixed this way. 287 Usually the top and bottom weight matrices are the transpose of each other and are fixed this way.
273 The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer. 288 The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer.