Mercurial > ift6266
comparison writeup/techreport.tex @ 392:5f8fffd7347f
possible image for illustrating perturbations
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 27 Apr 2010 09:56:18 -0400 |
parents | 0a91fc69ff90 |
children | 4c840798d290 |
comparison
equal
deleted
inserted
replaced
391:d76c85ba12d6 | 392:5f8fffd7347f |
---|---|
10 \date{April 2010, Technical Report, Dept. IRO, U. Montreal} | 10 \date{April 2010, Technical Report, Dept. IRO, U. Montreal} |
11 | 11 |
12 \maketitle | 12 \maketitle |
13 | 13 |
14 \begin{abstract} | 14 \begin{abstract} |
15 | 15 Recent theoretical and empirical work in statistical machine learning has |
16 demonstrated the importance of learning algorithms for deep | |
17 architectures, i.e., function classes obtained by composing multiple | |
18 non-linear transformations. In the area of handwriting recognition, | |
19 deep learning algorithms | |
20 had been evaluated on rather small datasets with a few tens of thousands | |
21 of examples. Here we propose a powerful generator of variations | |
22 of examples for character images based on a pipeline of stochastic | |
23 transformations that include not only the usual affine transformations | |
24 but also the addition of slant, local elastic deformations, changes | |
25 in thickness, background images, color, contrast, occlusion, and | |
26 various types of pixel and spatially correlated noise. | |
27 We evaluate a deep learning algorithm (Stacked Denoising Autoencoders) | |
28 on the task of learning to classify digits and letters transformed | |
29 with this pipeline, using the hundreds of millions of generated examples | |
30 and testing on the full NIST test set. | |
31 We find that the SDA outperforms its | |
32 shallow counterpart, an ordinary Multi-Layer Perceptron, | |
33 and that it is better able to take advantage of the additional | |
34 generated data. | |
16 \end{abstract} | 35 \end{abstract} |
17 | 36 |
18 \section{Introduction} | 37 \section{Introduction} |
38 | |
39 Deep Learning has emerged as a promising new area of research in | |
40 statistical machine learning (see~\emcite{Bengio-2009} for a review). | |
41 Learning algorithms for deep architectures are centered on the learning | |
42 of useful representations of data, which are better suited to the task at hand. | |
43 This is in great part inspired by observations of the mammalian visual cortex, | |
44 which consists of a chain of processing elements, each of which is associated with a | |
45 different representation. In fact, | |
46 it was found recently that the features learnt in deep architectures resemble | |
47 those observed in the first two of these stages (in areas V1 and V2 | |
48 of visual cortex)~\cite{HonglakL2008}. | |
49 Processing images typically involves transforming the raw pixel data into | |
50 new {\bf representations} that can be used for analysis or classification. | |
51 For example, a principal component analysis representation linearly projects | |
52 the input image into a lower-dimensional feature space. | |
53 Why learn a representation? Current practice in the computer vision | |
54 literature converts the raw pixels into a hand-crafted representation | |
55 (e.g.\ SIFT features~\cite{Lowe04}), but deep learning algorithms | |
56 tend to discover similar features in their first few | |
57 levels~\cite{HonglakL2008,ranzato-08,Koray-08,VincentPLarochelleH2008-very-small}. | |
58 Learning increases the | |
59 ease and practicality of developing representations that are at once | |
60 tailored to specific tasks, yet are able to borrow statistical strength | |
61 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the | |
62 feature representation can lead to higher-level (more abstract, more | |
63 general) features that are more robust to unanticipated sources of | |
64 variance extant in real data. | |
65 | |
66 Whereas a deep architecture can in principle be more powerful than a shallow | |
67 one in terms of representation, depth appears to render the training problem | |
68 more difficult in terms of optimization and local minima. | |
69 It is also only recently that | |
70 successful algorithms were proposed to overcome some of these | |
71 difficulties. | |
19 | 72 |
20 \section{Perturbation and Transformation of Character Images} | 73 \section{Perturbation and Transformation of Character Images} |
21 | 74 |
22 \subsection{Affine Transformations} | 75 \subsection{Affine Transformations} |
23 \subsection{Adding Slant} | 76 \subsection{Adding Slant} |
58 | 111 |
59 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} | 112 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} |
60 | 113 |
61 \section{Experimental Results} | 114 \section{Experimental Results} |
62 | 115 |
63 \subsection{SDAE vs MLP} | 116 \subsection{SDA vs MLP} |
117 | |
118 \begin{center} | |
119 \begin{tabular}{lcc} | |
120 & train w/ & train w/ \\ | |
121 & NIST & P07 + NIST \\ \hline | |
122 SDA & & \\ \hline | |
123 MLP & & \\ \hline | |
124 \end{tabular} | |
125 \end{center} | |
64 | 126 |
65 \subsection{Perturbed Training Data More Helpful for SDAE} | 127 \subsection{Perturbed Training Data More Helpful for SDAE} |
66 | 128 |
67 \subsection{Training with More Classes than Necessary} | 129 \subsection{Training with More Classes than Necessary} |
68 | 130 |
69 \section{Conclusions} | 131 \section{Conclusions} |
70 | 132 |
71 \bibliography{strings,ml} | 133 \bibliography{strings,ml,aigaion} |
72 \bibliographystyle{mlapa} | 134 \bibliographystyle{mlapa} |
73 | 135 |
74 \end{document} | 136 \end{document} |