comparison writeup/techreport.tex @ 392:5f8fffd7347f

possible image for illustrating perturbations
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 27 Apr 2010 09:56:18 -0400
parents 0a91fc69ff90
children 4c840798d290
comparison
equal deleted inserted replaced
391:d76c85ba12d6 392:5f8fffd7347f
10 \date{April 2010, Technical Report, Dept. IRO, U. Montreal} 10 \date{April 2010, Technical Report, Dept. IRO, U. Montreal}
11 11
12 \maketitle 12 \maketitle
13 13
14 \begin{abstract} 14 \begin{abstract}
15 15 Recent theoretical and empirical work in statistical machine learning has
16 demonstrated the importance of learning algorithms for deep
17 architectures, i.e., function classes obtained by composing multiple
18 non-linear transformations. In the area of handwriting recognition,
19 deep learning algorithms
20 had been evaluated on rather small datasets with a few tens of thousands
21 of examples. Here we propose a powerful generator of variations
22 of examples for character images based on a pipeline of stochastic
23 transformations that include not only the usual affine transformations
24 but also the addition of slant, local elastic deformations, changes
25 in thickness, background images, color, contrast, occlusion, and
26 various types of pixel and spatially correlated noise.
27 We evaluate a deep learning algorithm (Stacked Denoising Autoencoders)
28 on the task of learning to classify digits and letters transformed
29 with this pipeline, using the hundreds of millions of generated examples
30 and testing on the full NIST test set.
31 We find that the SDA outperforms its
32 shallow counterpart, an ordinary Multi-Layer Perceptron,
33 and that it is better able to take advantage of the additional
34 generated data.
16 \end{abstract} 35 \end{abstract}
17 36
18 \section{Introduction} 37 \section{Introduction}
38
39 Deep Learning has emerged as a promising new area of research in
40 statistical machine learning (see~\emcite{Bengio-2009} for a review).
41 Learning algorithms for deep architectures are centered on the learning
42 of useful representations of data, which are better suited to the task at hand.
43 This is in great part inspired by observations of the mammalian visual cortex,
44 which consists of a chain of processing elements, each of which is associated with a
45 different representation. In fact,
46 it was found recently that the features learnt in deep architectures resemble
47 those observed in the first two of these stages (in areas V1 and V2
48 of visual cortex)~\cite{HonglakL2008}.
49 Processing images typically involves transforming the raw pixel data into
50 new {\bf representations} that can be used for analysis or classification.
51 For example, a principal component analysis representation linearly projects
52 the input image into a lower-dimensional feature space.
53 Why learn a representation? Current practice in the computer vision
54 literature converts the raw pixels into a hand-crafted representation
55 (e.g.\ SIFT features~\cite{Lowe04}), but deep learning algorithms
56 tend to discover similar features in their first few
57 levels~\cite{HonglakL2008,ranzato-08,Koray-08,VincentPLarochelleH2008-very-small}.
58 Learning increases the
59 ease and practicality of developing representations that are at once
60 tailored to specific tasks, yet are able to borrow statistical strength
61 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the
62 feature representation can lead to higher-level (more abstract, more
63 general) features that are more robust to unanticipated sources of
64 variance extant in real data.
65
66 Whereas a deep architecture can in principle be more powerful than a shallow
67 one in terms of representation, depth appears to render the training problem
68 more difficult in terms of optimization and local minima.
69 It is also only recently that
70 successful algorithms were proposed to overcome some of these
71 difficulties.
19 72
20 \section{Perturbation and Transformation of Character Images} 73 \section{Perturbation and Transformation of Character Images}
21 74
22 \subsection{Affine Transformations} 75 \subsection{Affine Transformations}
23 \subsection{Adding Slant} 76 \subsection{Adding Slant}
58 111
59 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} 112 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
60 113
61 \section{Experimental Results} 114 \section{Experimental Results}
62 115
63 \subsection{SDAE vs MLP} 116 \subsection{SDA vs MLP}
117
118 \begin{center}
119 \begin{tabular}{lcc}
120 & train w/ & train w/ \\
121 & NIST & P07 + NIST \\ \hline
122 SDA & & \\ \hline
123 MLP & & \\ \hline
124 \end{tabular}
125 \end{center}
64 126
65 \subsection{Perturbed Training Data More Helpful for SDAE} 127 \subsection{Perturbed Training Data More Helpful for SDAE}
66 128
67 \subsection{Training with More Classes than Necessary} 129 \subsection{Training with More Classes than Necessary}
68 130
69 \section{Conclusions} 131 \section{Conclusions}
70 132
71 \bibliography{strings,ml} 133 \bibliography{strings,ml,aigaion}
72 \bibliographystyle{mlapa} 134 \bibliographystyle{mlapa}
73 135
74 \end{document} 136 \end{document}