Mercurial > ift6266
annotate writeup/techreport.tex @ 546:cf68f5685406
changements aux graphes
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Wed, 02 Jun 2010 11:45:17 -0400 |
parents | 8aad1c6ec39a |
children | 9ebb335ca904 |
rev | line source |
---|---|
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
1 \documentclass[12pt,letterpaper]{article} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
2 \usepackage[utf8]{inputenc} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
3 \usepackage{graphicx} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 \usepackage{times} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 \usepackage{mlapa} |
452
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
6 \usepackage{subfigure} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 \begin{document} |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
9 \title{Generating and Exploiting Perturbed and Multi-Task Handwritten Training Data for Deep Architectures} |
381 | 10 \author{The IFT6266 Gang} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 \date{April 2010, Technical Report, Dept. IRO, U. Montreal} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
12 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 \maketitle |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 \begin{abstract} |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
16 Recent theoretical and empirical work in statistical machine learning has |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
17 demonstrated the importance of learning algorithms for deep |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
18 architectures, i.e., function classes obtained by composing multiple |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
19 non-linear transformations. In the area of handwriting recognition, |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
20 deep learning algorithms |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
21 had been evaluated on rather small datasets with a few tens of thousands |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
22 of examples. Here we propose a powerful generator of variations |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
23 of examples for character images based on a pipeline of stochastic |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
24 transformations that include not only the usual affine transformations |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
25 but also the addition of slant, local elastic deformations, changes |
541 | 26 in thickness, background images, grey level, contrast, occlusion, and |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
27 various types of pixel and spatially correlated noise. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
28 We evaluate a deep learning algorithm (Stacked Denoising Autoencoders) |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
29 on the task of learning to classify digits and letters transformed |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
30 with this pipeline, using the hundreds of millions of generated examples |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
31 and testing on the full 62-class NIST test set. |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
32 We find that the SDA outperforms its |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
33 shallow counterpart, an ordinary Multi-Layer Perceptron, |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
34 and that it is better able to take advantage of the additional |
438 | 35 generated data, as well as better able to take advantage of |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
36 the multi-task setting, i.e., |
438 | 37 training from more classes than those of interest in the end. |
38 In fact, we find that the SDA reaches human performance as | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
39 estimated by the Amazon Mechanical Turk on the 62-class NIST test characters. |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
40 \end{abstract} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
41 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
42 \section{Introduction} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
43 |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
44 Deep Learning has emerged as a promising new area of research in |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
45 statistical machine learning (see~\emcite{Bengio-2009} for a review). |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
46 Learning algorithms for deep architectures are centered on the learning |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
47 of useful representations of data, which are better suited to the task at hand. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
48 This is in great part inspired by observations of the mammalian visual cortex, |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
49 which consists of a chain of processing elements, each of which is associated with a |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
50 different representation. In fact, |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
51 it was found recently that the features learnt in deep architectures resemble |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
52 those observed in the first two of these stages (in areas V1 and V2 |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
53 of visual cortex)~\cite{HonglakL2008}. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
54 Processing images typically involves transforming the raw pixel data into |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
55 new {\bf representations} that can be used for analysis or classification. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
56 For example, a principal component analysis representation linearly projects |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
57 the input image into a lower-dimensional feature space. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
58 Why learn a representation? Current practice in the computer vision |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
59 literature converts the raw pixels into a hand-crafted representation |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
60 (e.g.\ SIFT features~\cite{Lowe04}), but deep learning algorithms |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
61 tend to discover similar features in their first few |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
62 levels~\cite{HonglakL2008,ranzato-08,Koray-08,VincentPLarochelleH2008-very-small}. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
63 Learning increases the |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
64 ease and practicality of developing representations that are at once |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
65 tailored to specific tasks, yet are able to borrow statistical strength |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
66 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
67 feature representation can lead to higher-level (more abstract, more |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
68 general) features that are more robust to unanticipated sources of |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
69 variance extant in real data. |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
70 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
71 Whereas a deep architecture can in principle be more powerful than a |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
72 shallow one in terms of representation, depth appears to render the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
73 training problem more difficult in terms of optimization and local minima. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
74 It is also only recently that successful algorithms were proposed to |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
75 overcome some of these difficulties. All are based on unsupervised |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
76 learning, often in an greedy layer-wise ``unsupervised pre-training'' |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
77 stage~\cite{Bengio-2009}. One of these layer initialization techniques, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
78 applied here, is the Denoising |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
79 Auto-Encoder~(DEA)~\cite{VincentPLarochelleH2008-very-small}, which |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
80 performed similarly or better than previously proposed Restricted Boltzmann |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
81 Machines in terms of unsupervised extraction of a hierarchy of features |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
82 useful for classification. The principle is that each layer starting from |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
83 the bottom is trained to encode their input (the output of the previous |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
84 layer) and try to reconstruct it from a corrupted version of it. After this |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
85 unsupervised initialization, the stack of denoising auto-encoders can be |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
86 converted into a deep supervised feedforward neural network and trained by |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
87 stochastic gradient descent. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
88 |
407
fe2e2964e7a3
description des transformations en cours ajout d un fichier special.bib pour des references specifiques
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
393
diff
changeset
|
89 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
90 \section{Perturbation and Transformation of Character Images} |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
91 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
92 This section describes the different transformations we used to generate data, in their order. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
93 The code for these transformations (mostly python) is available at |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
94 {\tt http://anonymous.url.net}. All the modules in the pipeline share |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
95 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
96 amount of deformation or noise introduced. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
97 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
98 We can differentiate two important parts in the pipeline. The first one, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
99 from slant to pinch, performs transformations of the character. The second |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
100 part, from blur to contrast, adds noise to the image. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
101 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
102 \subsection{Slant} |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
103 |
541 | 104 We mimic slant by shifting each row of the image |
105 proportionally to its height: $shift = round(slant \times height)$. | |
106 The $slant$ coefficient can be negative or positive with equal probability | |
107 and its value is randomly sampled according to the complexity level: | |
108 $slant \sim U[0,complexity]$, so the | |
109 maximum displacement for the lowest or highest pixel line is of | |
110 $round(complexity \times 32)$. | |
111 | |
112 --- | |
113 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
114 In order to mimic a slant effect, we simply shift each row of the image |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
115 proportionnaly to its height: $shift = round(slant \times height)$. We |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
116 round the shift in order to have a discret displacement. We do not use a |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
117 filter to smooth the result in order to save computing time and also |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
118 because latter transformations have similar effects. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
119 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
120 The $slant$ coefficient can be negative or positive with equal probability |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
121 and its value is randomly sampled according to the complexity level. In |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
122 our case we take uniformly a number in the range $[0,complexity]$, so the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
123 maximum displacement for the lowest or highest pixel line is of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
124 $round(complexity \times 32)$. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
125 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
126 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
127 \subsection{Thickness} |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
128 |
541 | 129 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
130 are applied. The neighborhood of each pixel is multiplied | |
131 element-wise with a {\em structuring element} matrix. | |
132 The pixel value is replaced by the maximum or the minimum of the resulting | |
133 matrix, respectively for dilation or erosion. Ten different structural elements with | |
134 increasing dimensions (largest is $5\times5$) were used. For each image, | |
135 randomly sample the operator type (dilation or erosion) with equal probability and one structural | |
136 element from a subset of the $n$ smallest structuring elements where $n$ is | |
137 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ | |
138 for erosion. A neutral element is always present in the set, and if it is | |
139 chosen no transformation is applied. Erosion allows only the six | |
140 smallest structural elements because when the character is too thin it may | |
141 be completely erased. | |
142 | |
143 --- | |
144 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
145 To change the thickness of the characters we used morpholigical operators: |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
146 dilation and erosion~\cite{Haralick87,Serra82}. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
147 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
148 The basic idea of such transform is, for each pixel, to multiply in the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
149 element-wise manner its neighbourhood with a matrix called the structuring |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
150 element. Then for dilation we remplace the pixel value by the maximum of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
151 the result, or the minimum for erosion. This will dilate or erode objects |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
152 in the image and the strength of the transform only depends on the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
153 structuring element. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
154 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
155 We used ten different structural elements with increasing dimensions (the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
156 biggest is $5\times5$). for each image, we radomly sample the operator |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
157 type (dilation or erosion) with equal probability and one structural |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
158 element from a subset of the $n$ smallest structuring elements where $n$ is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
159 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
160 for erosion. A neutral element is always present in the set, if it is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
161 chosen the transformation is not applied. Erosion allows only the six |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
162 smallest structural elements because when the character is too thin it may |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
163 erase it completly. |
407
fe2e2964e7a3
description des transformations en cours ajout d un fichier special.bib pour des references specifiques
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
393
diff
changeset
|
164 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
165 \subsection{Affine Transformations} |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
166 |
541 | 167 A $2 \times 3$ affine transform matrix (with |
168 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | |
169 Each pixel $(x,y)$ of the output image takes the value of the pixel | |
170 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This | |
171 produces scaling, translation, rotation and shearing. | |
172 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to | |
173 forbid important rotations (not to confuse classes) but to give good | |
174 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times | |
175 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 | |
176 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times | |
177 complexity]$. | |
178 | |
179 ---- | |
180 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
181 We generate an affine transform matrix according to the complexity level, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
182 then we apply it directly to the image. The matrix is of size $2 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
183 3$, so we can represent it by six parameters $(a,b,c,d,e,f)$. Formally, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
184 for each pixel $(x,y)$ of the output image, we give the value of the pixel |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
185 nearest to : $(ax+by+c,dx+ey+f)$, in the input image. This allows to |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
186 produce scaling, translation, rotation and shearing variances. |
431
bfa349f567e8
correction in the transformation descripition
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
428
diff
changeset
|
187 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
188 The sampling of the parameters $(a,b,c,d,e,f)$ have been tuned by hand to |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
189 forbid important rotations (not to confuse classes) but to give good |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
190 variability of the transformation. For each image we sample uniformly the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
191 parameters in the following ranges: $a$ and $d$ in $[1-3 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
192 complexity,1+3 \times complexity]$, $b$ and $e$ in $[-3 \times complexity,3 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
193 \times complexity]$ and $c$ and $f$ in $[-4 \times complexity, 4 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
194 complexity]$. |
431
bfa349f567e8
correction in the transformation descripition
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
428
diff
changeset
|
195 |
407
fe2e2964e7a3
description des transformations en cours ajout d un fichier special.bib pour des references specifiques
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
393
diff
changeset
|
196 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
197 \subsection{Local Elastic Deformations} |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
198 |
541 | 199 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, |
200 which provides more details. | |
201 Two ``displacements'' fields are generated and applied, for horizontal | |
202 and vertical displacements of pixels. | |
203 To generate a pixel in either field, first a value between -1 and 1 is | |
204 chosen from a uniform distribution. Then all the pixels, in both fields, are | |
205 multiplied by a constant $\alpha$ which controls the intensity of the | |
206 displacements (larger $\alpha$ translates into larger wiggles). | |
207 Each field is convolved with a Gaussian 2D kernel of | |
208 standard deviation $\sigma$. Visually, this results in a blur. | |
209 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times | |
210 \sqrt[3]{complexity}$. | |
211 | |
212 ---- | |
213 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
214 This filter induces a "wiggly" effect in the image. The description here |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
215 will be brief, as the algorithm follows precisely what is described in |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
216 \cite{SimardSP03}. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
217 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
218 The general idea is to generate two "displacements" fields, for horizontal |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
219 and vertical displacements of pixels. Each of these fields has the same |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
220 size as the original image. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
221 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
222 When generating the transformed image, we'll loop over the x and y |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
223 positions in the fields and select, as a value, the value of the pixel in |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
224 the original image at the (relative) position given by the displacement |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
225 fields for this x and y. If the position we'd retrieve is outside the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
226 borders of the image, we use a 0 value instead. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
227 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
228 To generate a pixel in either field, first a value between -1 and 1 is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
229 chosen from a uniform distribution. Then all the pixels, in both fields, is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
230 multiplied by a constant $\alpha$ which controls the intensity of the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
231 displacements (bigger $\alpha$ translates into larger wiggles). |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
232 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
233 As a final step, each field is convoluted with a Gaussian 2D kernel of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
234 standard deviation $\sigma$. Visually, this results in a "blur" |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
235 filter. This has the effect of making values next to each other in the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
236 displacement fields similar. In effect, this makes the wiggles more |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
237 coherent, less noisy. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
238 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
239 As displacement fields were long to compute, 50 pairs of fields were |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
240 generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
241 for 0.2, etc.), and afterwards, given a complexity, we selected randomly |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
242 among the 50 corresponding pairs. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
243 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
244 $\sigma$ and $\alpha$ were linked to complexity through the formulas |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
245 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
246 \sqrt[3]{complexity}$. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
247 |
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
248 |
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
249 \subsection{Pinch} |
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
250 |
541 | 251 This is a GIMP filter called ``Whirl and |
252 pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic | |
253 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | |
254 For a square input image, this is akin to drawing a circle of | |
255 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | |
256 that disk (region inside circle) will have its value recalculated by taking | |
257 the value of another ``source'' pixel in the original image. The position of | |
258 that source pixel is found on the line that goes through $C$ and $P$, but | |
259 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ | |
260 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | |
261 d_1$, where $pinch$ is a parameter to the filter. | |
262 The actual value is given by bilinear interpolation considering the pixels | |
263 around the (non-integer) source position thus found. | |
264 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | |
265 | |
266 --- | |
267 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
268 This is another GIMP filter we used. The filter is in fact named "Whirl and |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
269 pinch", but we don't use the "whirl" part (whirl is set to 0). As described |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
270 in GIMP, a pinch is "similar to projecting the image onto an elastic |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
271 surface and pressing or pulling on the center of the surface". |
416
5f9d04dda707
Correction d'une erreur pour pinch et ajout d'une ref bibliographique
fsavard
parents:
415
diff
changeset
|
272 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
273 Mathematically, for a square input image, think of drawing a circle of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
274 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
275 that disk (region inside circle) will have its value recalculated by taking |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
276 the value of another "source" pixel in the original image. The position of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
277 that source pixel is found on the line thats goes through $C$ and $P$, but |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
278 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
279 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
280 d_1$, where $pinch$ is a parameter to the filter. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
281 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
282 If the region considered is not square then, before computing $d_2$, the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
283 smallest dimension (x or y) is stretched such that we may consider the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
284 region as if it was square. Then, after $d_2$ has been computed and |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
285 corresponding components $d_2\_x$ and $d_2\_y$ have been found, the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
286 component corresponding to the stretched dimension is compressed back by an |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
287 inverse ratio. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
288 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
289 The actual value is given by bilinear interpolation considering the pixels |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
290 around the (non-integer) source position thus found. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
291 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
292 The value for $pinch$ in our case was given by sampling from an uniform |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
293 distribution over the range $[-complexity, 0.7 \times complexity]$. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
294 |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
295 \subsection{Motion Blur} |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
296 |
541 | 297 This is a ``linear motion blur'' in GIMP |
298 terminology, with two parameters, $length$ and $angle$. The value of | |
299 a pixel in the final image is approximately the mean value of the $length$ first pixels | |
300 found by moving in the $angle$ direction. | |
301 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | |
302 | |
303 ---- | |
304 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
305 This is a GIMP filter we applied, a "linear motion blur" in GIMP |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
306 terminology. The description will be brief as it is a well-known filter. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
307 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
308 This algorithm has two input parameters, $length$ and $angle$. The value of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
309 a pixel in the final image is the mean value of the $length$ first pixels |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
310 found by moving in the $angle$ direction. An approximation of this idea is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
311 used, as we won't fall onto precise pixels by following that |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
312 direction. This is done using the Bresenham line algorithm. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
313 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
314 The angle, in our case, is chosen from a uniform distribution over |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
315 $[0,360]$ degrees. The length, though, depends on the complexity; it's |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
316 sampled from a Gaussian distribution of mean 0 and standard deviation |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
317 $\sigma = 3 \times complexity$. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
318 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
319 \subsection{Occlusion} |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
320 |
541 | 321 Selects a random rectangle from an {\em occluder} character |
322 images and places it over the original {\em occluded} character | |
323 image. Pixels are combined by taking the max(occluder,occluded), | |
324 closer to black. The rectangle corners | |
325 are sampled so that larger complexity gives larger rectangles. | |
326 The destination position in the occluded image are also sampled | |
327 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). | |
328 This filter has a probability of 60\% of not being applied. | |
329 | |
330 --- | |
331 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
332 This filter selects random parts of other (hereafter "occlusive") letter |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
333 images and places them over the original letter (hereafter "occluded") |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
334 image. To be more precise, having selected a subregion of the occlusive |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
335 image and a desination position in the occluded image, to determine the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
336 final value for a given overlapping pixel, it selects whichever pixel is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
337 the lightest. As a reminder, the background value is 0, black, so the value |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
338 nearest to 1 is selected. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
339 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
340 To select a subpart of the occlusive image, four numbers are generated. For |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
341 compability with the code, we'll call them "haut", "bas", "gauche" and |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
342 "droite" (respectively meaning top, bottom, left and right). Each of these |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
343 numbers is selected according to a Gaussian distribution of mean $8 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
344 complexity$ and standard deviation $2$. This means the largest the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
345 complexity is, the biggest the occlusion will be. The absolute value is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
346 taken, as the numbers must be positive, and the maximum value is capped at |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
347 15. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
348 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
349 These four sizes collectively define a window centered on the middle pixel |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
350 of the occlusive image. This is the part that will be extracted as the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
351 occlusion. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
352 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
353 The next step is to select a destination position in the occluded |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
354 image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
355 are selected according to Gaussian distributions of mean 0 and of standard |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
356 deviations of, respectively, 3 and 2. Then an horizontal placement mode, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
357 $place$, is selected to be of three values meaning |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
358 left, middle or right. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
359 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
360 If $place$ is "middle", the occlusion will be horizontally centered |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
361 around the horizontal middle of the occluded image, then shifted according |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
362 to $x\_arrivee$. If $place$ is "left", it will be placed on the left of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
363 the occluded image, then displaced right according to $x\_arrivee$. The |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
364 contrary happens if $place$ is $right$. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
365 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
366 In both the horizontal and vertical positionning, the maximum position in |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
367 either direction is such that the selected occlusion won't go beyond the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
368 borders of the occluded image. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
369 |
416
5f9d04dda707
Correction d'une erreur pour pinch et ajout d'une ref bibliographique
fsavard
parents:
415
diff
changeset
|
370 This filter has a probability of not being applied, at all, of 60\%. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
371 |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
372 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
373 \subsection{Pixel Permutation} |
442
d5b2b6397a5a
added permut pixel
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
431
diff
changeset
|
374 |
541 | 375 This filter permutes neighbouring pixels. It selects first |
376 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then | |
377 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number | |
378 of exchanges to the left, right, top, bottom is equal or does not differ | |
379 from more than 1 if the number of selected pixels is not a multiple of 4. | |
380 % TODO: The previous sentence is hard to parse | |
381 This filter has a probability of 80\% of not being applied. | |
382 | |
383 --- | |
384 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
385 This filter permuts neighbouring pixels. It selects first |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
386 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
387 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
388 of exchanges to the left, right, top, bottom are equal or does not differ |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
389 from more than 1 if the number of selected pixels is not a multiple of 4. |
442
d5b2b6397a5a
added permut pixel
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
431
diff
changeset
|
390 |
d5b2b6397a5a
added permut pixel
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
431
diff
changeset
|
391 It has has a probability of not being applied, at all, of 80\%. |
d5b2b6397a5a
added permut pixel
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
431
diff
changeset
|
392 |
d5b2b6397a5a
added permut pixel
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
431
diff
changeset
|
393 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
394 \subsection{Gaussian Noise} |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
395 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
396 This filter simply adds, to each pixel of the image independently, a |
541 | 397 noise $\sim Normal(0(\frac{complexity}{10})^2)$. |
398 It has a probability of 70\% of not being applied. | |
399 | |
400 --- | |
401 | |
402 This filter simply adds, to each pixel of the image independently, a | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
403 Gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
404 |
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
405 It has has a probability of not being applied, at all, of 70\%. |
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
406 |
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
407 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
408 \subsection{Background Images} |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
409 |
541 | 410 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
411 background behind the letter. The background is chosen by first selecting, | |
412 at random, an image from a set of images. Then a 32$\times$32 sub-region | |
413 of that image is chosen as the background image (by sampling position | |
414 uniformly while making sure not to cross image borders). | |
415 To combine the original letter image and the background image, contrast | |
416 adjustments are made. We first get the maximal values (i.e. maximal | |
417 intensity) for both the original image and the background image, $maximage$ | |
418 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. | |
419 Each background pixel value is multiplied by $\frac{max(maximage - | |
420 contrast, 0)}{maxbg}$ (higher contrast yield darker | |
421 background). The output image pixels are max(background,original). | |
422 | |
423 --- | |
424 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
425 Following~\cite{Larochelle-jmlr-2009}, this transformation adds a random |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
426 background behind the letter. The background is chosen by first selecting, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
427 at random, an image from a set of images. Then we choose a 32x32 subregion |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
428 of that image as the background image (by sampling x and y positions |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
429 uniformly while making sure not to cross image borders). |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
430 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
431 To combine the original letter image and the background image, contrast |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
432 adjustments are made. We first get the maximal values (i.e. maximal |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
433 intensity) for both the original image and the background image, $maximage$ |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
434 and $maxbg$. We also have a parameter, $contrast$, given by sampling from a |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
435 uniform distribution over $[complexity, 1]$. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
436 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
437 Once we have all these numbers, we first adjust the values for the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
438 background image. Each pixel value is multiplied by $\frac{max(maximage - |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
439 contrast, 0)}{maxbg}$. Therefore the higher the contrast, the darkest the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
440 background will be. |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
441 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
442 The final image is found by taking the brightest (i.e. value nearest to 1) |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
443 pixel from either the background image or the corresponding pixel in the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
444 original image. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
445 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
446 \subsection{Salt and Pepper Noise} |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
447 |
541 | 448 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
449 The number of selected pixels is $0.2 \times complexity$. | |
450 This filter has a probability of not being applied at all of 75\%. | |
451 | |
452 --- | |
453 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
454 This filter adds noise to the image by randomly selecting a certain number |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
455 of them and, for those selected pixels, assign a random value according to |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
456 a uniform distribution over the $[0,1]$ ranges. This last distribution does |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
457 not change according to complexity. Instead, the number of selected pixels |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
458 does: the proportion of changed pixels corresponds to $complexity / 5$, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
459 which means, as a maximum, 20\% of the pixels will be randomized. On the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
460 lowest extreme, no pixel is changed. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
461 |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
462 This filter also has a probability of not being applied, at all, of 75\%. |
415
1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
fsavard
parents:
411
diff
changeset
|
463 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
464 \subsection{Spatially Gaussian Noise} |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
465 |
541 | 466 Different regions of the image are spatially smoothed. |
467 The image is convolved with a symmetric Gaussian kernel of | |
468 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | |
469 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | |
470 between $0$ and $1$. We also create a symmetric averaging window, of the | |
471 kernel size, with maximum value at the center. For each image we sample | |
472 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be | |
473 averaging centers between the original image and the filtered one. We | |
474 initialize to zero a mask matrix of the image size. For each selected pixel | |
475 we add to the mask the averaging window centered to it. The final image is | |
476 computed from the following element-wise operation: $\frac{image + filtered | |
477 image \times mask}{mask+1}$. | |
478 This filter has a probability of not being applied at all of 75\%. | |
479 | |
480 ---- | |
481 | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
482 The aim of this transformation is to filter, with a gaussian kernel, |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
483 different regions of the image. In order to save computing time we decided |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
484 to convolve the whole image only once with a symmetric gaussian kernel of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
485 size and variance choosen uniformly in the ranges: $[12,12 + 20 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
486 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
487 between $0$ and $1$. We also create a symmetric averaging window, of the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
488 kernel size, with maximum value at the center. For each image we sample |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
489 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
490 averaging centers between the original image and the filtered one. We |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
491 initialize to zero a mask matrix of the image size. For each selected pixel |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
492 we add to the mask the averaging window centered to it. The final image is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
493 computed from the following element-wise operation: $\frac{image + filtered |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
494 image \times mask}{mask+1}$. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
495 |
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
496 This filter has a probability of not being applied, at all, of 75\%. |
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
497 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
498 \subsection{Scratches} |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
499 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
500 The scratches module places line-like white patches on the image. The |
541 | 501 lines are heavily transformed images of the digit ``1'' (one), chosen |
502 at random among five thousands such 1 images. The 1 image is | |
503 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | |
504 complexity)^2$, using bi-cubic interpolation, | |
505 Two passes of a grey-scale morphological erosion filter | |
506 are applied, reducing the width of the line | |
507 by an amount controlled by $complexity$. | |
508 This filter is only applied only 15\% of the time. When it is applied, 50\% | |
509 of the time, only one patch image is generated and applied. In 30\% of | |
510 cases, two patches are generated, and otherwise three patches are | |
511 generated. The patch is applied by taking the maximal value on any given | |
512 patch or the original image, for each of the 32x32 pixel locations. | |
513 | |
514 --- | |
515 | |
516 The scratches module places line-like white patches on the image. The | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
517 lines are in fact heavily transformed images of the digit "1" (one), chosen |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
518 at random among five thousands such start images of this digit. |
428 | 519 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
520 Once the image is selected, the transformation begins by finding the first |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
521 $top$, $bottom$, $right$ and $left$ non-zero pixels in the image. It is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
522 then cropped to the region thus delimited, then this cropped version is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
523 expanded to $32\times32$ again. It is then rotated by a random angle having a |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
524 Gaussian distribution of mean 90 and standard deviation $100 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
525 complexity$ (in degrees). The rotation is done with bicubic interpolation. |
428 | 526 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
527 The rotated image is then resized to $50\times50$, with anti-aliasing. In |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
528 that image, we crop the image again by selecting a region delimited |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
529 horizontally to $left$ to $left+32$ and vertically by $top$ to $top+32$. |
428 | 530 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
531 Once this is done, two passes of a greyscale morphological erosion filter |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
532 are applied. Put briefly, this erosion filter reduces the width of the line |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
533 by a certain $smoothing$ amount. For small complexities (< 0.5), |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
534 $smoothing$ is 6, so the line is very small. For complexities ranging from |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
535 0.25 to 0.5, $smoothing$ is 5. It is 4 for complexities 0.5 to 0.75, and 3 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
536 for higher complexities. |
428 | 537 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
538 To compensate for border effects, the image is then cropped to 28x28 by |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
539 removing two pixels everywhere on the borders, then expanded to 32x32 |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
540 again. The pixel values are then linearly expanded such that the minimum |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
541 value is 0 and the maximal one is 1. Then, 50\% of the time, the image is |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
542 vertically flipped. |
428 | 543 |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
544 This filter is only applied only 15\% of the time. When it is applied, 50\% |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
545 of the time, only one patch image is generated and applied. In 30\% of |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
546 cases, two patches are generated, and otherwise three patches are |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
547 generated. The patch is applied by taking the maximal value on any given |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
548 patch or the original image, for each of the 32x32 pixel locations. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
549 |
541 | 550 \subsection{Grey Level and Contrast Changes} |
426
a7fab59de174
change order of transformations
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
425
diff
changeset
|
551 |
541 | 552 This filter changes the contrast and may invert the image polarity (white |
553 on black to black on white). The contrast $C$ is defined here as the | |
554 difference between the maximum and the minimum pixel value of the image. | |
555 Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$). | |
556 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | |
557 polarity is inverted with $0.5$ probability. | |
558 | |
559 --- | |
462
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
560 This filter changes the constrast and may invert the image polarity (white |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
561 on black to black on white). The contrast $C$ is defined here as the |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
562 difference between the maximum and the minimum pixel value of the image. A |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
563 contrast value is sampled uniformly between $1$ and $1-0.85 \times |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
564 complexity$ (this insure a minimum constrast of $0.15$). We then simply |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
565 normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
f59af1648d83
cleaner le techreport
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
461
diff
changeset
|
566 polarity is inverted with $0.5$ probability. |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
567 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
568 |
393
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
569 \begin{figure}[h] |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
570 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
571 \caption{Illustration of the pipeline of stochastic |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
572 transformations applied to the image of a lower-case t |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
573 (the upper left image). Each image in the pipeline (going from |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
574 left to right, first top line, then bottom line) shows the result |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
575 of applying one of the modules in the pipeline. The last image |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
576 (bottom right) is used as training example.} |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
577 \label{fig:pipeline} |
4c840798d290
added examples of figure and table of results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
392
diff
changeset
|
578 \end{figure} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
579 |
422
e7790db265b1
Basic text for section 3, add a bit more detail to section 4.2.2
Arnaud Bergeron <abergeron@gmail.com>
parents:
417
diff
changeset
|
580 |
479
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
581 \begin{figure}[h] |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
582 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\ |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
583 \caption{Illustration of each transformation applied to the same image |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
584 of the upper-case h (upper-left image). first row (from left to rigth) : original image, slant, |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
585 thickness, affine transformation, local elastic deformation; second row (from left to rigth) : |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
586 pinch, motion blur, occlusion, pixel permutation, gaussian noise; third row (from left to rigth) : |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
587 background image, salt and pepper noise, spatially gaussian noise, scratches, |
541 | 588 grey level and contrast changes.} |
479
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
589 \label{fig:transfo} |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
590 \end{figure} |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
591 |
6593e67381a3
Added transformation figure
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
477
diff
changeset
|
592 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
593 \section{Experimental Setup} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
594 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
595 \subsection{Training Datasets} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
596 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
597 \subsubsection{Data Sources} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
598 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
599 \begin{itemize} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
600 \item {\bf NIST} |
434
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
601 The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
602 The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
603 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
604 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
605 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
606 by NIST as testing set and is used in our work for that purpose. |
310c730516af
added description of nist19 and captcha data sources
goldfinger
parents:
432
diff
changeset
|
607 The performances reported by previous work on that dataset mostly use only the digits. |
432
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
608 Here we use the whole classes both in the training and testing phase. |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
609 |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
610 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
611 \item {\bf Fonts} |
477
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
612 In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net} |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
613 %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
614 in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly. |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
615 The ttf file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
616 directly as input to our models. |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
617 %Guillaume are there other details I forgot on the font selection? |
534d4ecf1bd1
small desription of the font added
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
463
diff
changeset
|
618 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
619 \item {\bf Captchas} |
432
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
620 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
621 generating characters of the same format as the NIST dataset. The core of this data source is composed with a random character |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
622 generator and various kinds of tranformations similar to those described in the previous sections. |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
623 In order to increase the variability of the data generated, different fonts are used for generating the characters. |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
624 Transformations (slant, distorsions, rotation, translation) are applied to each randomly generated character with a complexity |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
625 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are |
e2fd928a7de0
added description of nist19 and captcha data sources
goldfinger
parents:
428
diff
changeset
|
626 allowed and can be controlled via an easy to use facade class. |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
627 \item {\bf OCR data} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
628 \end{itemize} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
629 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
630 \subsubsection{Data Sets} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
631 \begin{itemize} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
632 \item {\bf P07} |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
633 The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. |
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
634 For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts, |
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
635 $0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order |
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
636 and for each of them we sample uniformly a complexity in the range $[0,0.7]$. |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
637 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} |
463
5fa1c653620c
added small information on NISTP
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
462
diff
changeset
|
638 NISTP is equivalent to P07 (complexity parameter of $0.7$ with the same sources proportion) except that we only apply transformations from slant to pinch. Therefore, the character is transformed |
420
a3a4a9c6476d
added transformations description and began dataset descriptions
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
417
diff
changeset
|
639 but no additionnal noise is added to the image, this gives images closer to the NIST dataset. |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
640 \end{itemize} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
641 |
452
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
642 We noticed that the distribution of the training sets and the test sets differ. |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
643 Since our validation sets are sampled from the training set, they have approximately the same distribution, but the test set has a completely different distribution as illustrated in figure \ref {setsdata}. |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
644 |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
645 \begin{figure} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
646 \subfigure[NIST training]{\includegraphics[width=0.5\textwidth]{images/nisttrainstats}} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
647 \subfigure[NIST validation]{\includegraphics[width=0.5\textwidth]{images/nistvalidstats}} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
648 \subfigure[NIST test]{\includegraphics[width=0.5\textwidth]{images/nistteststats}} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
649 \subfigure[NISTP validation]{\includegraphics[width=0.5\textwidth]{images/nistpvalidstats}} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
650 \caption{Proportion of each class in some of the data sets} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
651 \label{setsdata} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
652 \end{figure} |
b0622f78cfec
Add a small paragraph mentionning the distribution differences and a figure illustrating the difference.
Arnaud Bergeron <abergeron@gmail.com>
parents:
444
diff
changeset
|
653 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
654 \subsection{Models and their Hyperparameters} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
655 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
656 \subsubsection{Multi-Layer Perceptrons (MLP)} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
657 |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
658 An MLP is a family of functions that are described by stacking layers of of a function similar to |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
659 $$g(x) = \tanh(b+Wx)$$ |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
660 The input, $x$, is a $d$-dimension vector. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
661 The output, $g(x)$, is a $m$-dimension vector. |
411
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
662 The parameter $W$ is a $m\times d$ matrix and is called the weight matrix. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
663 The parameter $b$ is a $m$-vector and is called the bias vector. |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
664 The non-linearity (here $\tanh$) is applied element-wise to the output vector. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
665 Usually the input is referred to a input layer and similarly for the output. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
666 You can of course chain several such functions to obtain a more complex one. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
667 Here is a common example |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
668 $$f(x) = c + V\tanh(b+Wx)$$ |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
669 In this case the intermediate layer corresponding to $\tanh(b+Wx)$ is called a hidden layer. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
670 Here the output layer does not have the same non-linearity as the hidden layer. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
671 This is a common case where some specialized non-linearity is applied to the output layer only depending on the task at hand. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
672 |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
673 If you put 3 or more hidden layers in such a network you obtain what is called a deep MLP. |
411
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
674 The parameters to adapt are the weight matrix and the bias vector for each layer. |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
675 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
676 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} |
422
e7790db265b1
Basic text for section 3, add a bit more detail to section 4.2.2
Arnaud Bergeron <abergeron@gmail.com>
parents:
417
diff
changeset
|
677 \label{SdA} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
678 |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
679 Auto-encoders are essentially a way to initialize the weights of the network to enable better generalization. |
422
e7790db265b1
Basic text for section 3, add a bit more detail to section 4.2.2
Arnaud Bergeron <abergeron@gmail.com>
parents:
417
diff
changeset
|
680 This is essentially unsupervised training where the layer is made to reconstruct its input through and encoding and decoding phase. |
e7790db265b1
Basic text for section 3, add a bit more detail to section 4.2.2
Arnaud Bergeron <abergeron@gmail.com>
parents:
417
diff
changeset
|
681 Denoising auto-encoders are a variant where the input is corrupted with random noise but the target is the uncorrupted input. |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
682 The principle behind these initialization methods is that the network will learn the inherent relation between portions of the data and be able to represent them thus helping with whatever task we want to perform. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
683 |
411
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
684 An auto-encoder unit is formed of two MLP layers with the bottom one called the encoding layer and the top one the decoding layer. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
685 Usually the top and bottom weight matrices are the transpose of each other and are fixed this way. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
686 The network is trained as such and, when sufficiently trained, the MLP layer is initialized with the parameters of the encoding layer. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
687 The other parameters are discarded. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
688 |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
689 The stacked version is an adaptation to deep MLPs where you initialize each layer with a denoising auto-encoder starting from the bottom. |
411
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
690 During the initialization, which is usually called pre-training, the bottom layer is treated as if it were an isolated auto-encoder. |
4f69d915d142
Better description of the model parameters.
Arnaud Bergeron <abergeron@gmail.com>
parents:
410
diff
changeset
|
691 The second and following layers receive the same treatment except that they take as input the encoded version of the data that has gone through the layers before it. |
410
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
692 For additional details see \cite{vincent:icml08}. |
6330298791fb
Description brève de MLP et SdA
Arnaud Bergeron <abergeron@gmail.com>
parents:
407
diff
changeset
|
693 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
694 \section{Experimental Results} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
695 |
438 | 696 \subsection{SDA vs MLP vs Humans} |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
697 |
438 | 698 We compare here the best MLP (according to validation set error) that we found against |
699 the best SDA (again according to validation set error), along with a precise estimate | |
700 of human performance obtained via Amazon's Mechanical Turk (AMT) | |
701 service\footnote{http://mturk.com}. AMT users are paid small amounts | |
702 of money to perform tasks for which human intelligence is required. | |
703 Mechanical Turk has been used extensively in natural language | |
704 processing \cite{SnowEtAl2008} and vision | |
705 \cite{SorokinAndForsyth2008,whitehill09}. AMT users where presented | |
706 with 10 character images and asked to type 10 corresponding ascii | |
707 characters. Hence they were forced to make a hard choice among the | |
708 62 character classes. Three users classified each image, allowing | |
709 to estimate inter-human variability (shown as +/- in parenthesis below). | |
710 | |
711 \begin{table} | |
458
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
712 \caption{Overall comparison of error rates ($\pm$ std.err.) on 62 character classes (10 digits + |
438 | 713 26 lower + 26 upper), except for last columns -- digits only, between deep architecture with pre-training |
714 (SDA=Stacked Denoising Autoencoder) and ordinary shallow architecture | |
458
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
715 (MLP=Multi-Layer Perceptron). The models shown are all trained using perturbed data (NISTP or P07) |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
716 and using a validation set to select hyper-parameters and other training choices. |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
717 \{SDA,MLP\}0 are trained on NIST, |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
718 \{SDA,MLP\}1 are trained on NISTP, and \{SDA,MLP\}2 are trained on P07. |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
719 The human error rate on digits is a lower bound because it does not count digits that were |
461 | 720 recognized as letters. For comparison, the results found in the literature |
721 on NIST digits classification using the same test set are included.} | |
438 | 722 \label{tab:sda-vs-mlp-vs-humans} |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
723 \begin{center} |
438 | 724 \begin{tabular}{|l|r|r|r|r|} \hline |
458
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
725 & NIST test & NISTP test & P07 test & NIST test digits \\ \hline |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
726 Humans& 18.2\% $\pm$.1\% & 39.4\%$\pm$.1\% & 46.9\%$\pm$.1\% & $>1.1\%$ \\ \hline |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
727 SDA0 & 23.7\% $\pm$.14\% & 65.2\%$\pm$.34\% & 97.45\%$\pm$.06\% & 2.7\% $\pm$.14\%\\ \hline |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
728 SDA1 & 17.1\% $\pm$.13\% & 29.7\%$\pm$.3\% & 29.7\%$\pm$.3\% & 1.4\% $\pm$.1\%\\ \hline |
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
729 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
730 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline |
458
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
731 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
732 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline |
461 | 733 [5] & & & & 4.95\% $\pm$.18\% \\ \hline |
734 [2] & & & & 3.71\% $\pm$.16\% \\ \hline | |
735 [3] & & & & 2.4\% $\pm$.13\% \\ \hline | |
736 [4] & & & & 2.1\% $\pm$.12\% \\ \hline | |
392
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
737 \end{tabular} |
5f8fffd7347f
possible image for illustrating perturbations
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
381
diff
changeset
|
738 \end{center} |
438 | 739 \end{table} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
740 |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
741 \subsection{Perturbed Training Data More Helpful for SDAE} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
742 |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
743 \begin{table} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
744 \caption{Relative change in error rates due to the use of perturbed training data, |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
745 either using NISTP, for the MLP1/SDA1 models, or using P07, for the MLP2/SDA2 models. |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
746 A positive value indicates that training on the perturbed data helped for the |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
747 given test set (the first 3 columns on the 62-class tasks and the last one is |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
748 on the clean 10-class digits). Clearly, the deep learning models did benefit more |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
749 from perturbed training data, even when testing on clean data, whereas the MLP |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
750 trained on perturbed data performed worse on the clean digits and about the same |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
751 on the clean characters. } |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
752 \label{tab:sda-vs-mlp-vs-humans} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
753 \begin{center} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
754 \begin{tabular}{|l|r|r|r|r|} \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
755 & NIST test & NISTP test & P07 test & NIST test digits \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
756 SDA0/SDA1-1 & 38\% & 84\% & 228\% & 93\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
757 SDA0/SDA2-1 & 27\% & 94\% & 144\% & 59\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
758 MLP0/MLP1-1 & 5.2\% & 65\% & -13\% & -10\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
759 MLP0/MLP2-1 & -0.4\% & 49\% & 44\% & -29\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
760 \end{tabular} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
761 \end{center} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
762 \end{table} |
458
c0f738f0cef0
added many results
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
452
diff
changeset
|
763 |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
764 |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
765 \subsection{Multi-Task Learning Effects} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
766 |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
767 As previously seen, the SDA is better able to benefit from the |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
768 transformations applied to the data than the MLP. In this experiment we |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
769 define three tasks: recognizing digits (knowing that the input is a digit), |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
770 recognizing upper case characters (knowing that the input is one), and |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
771 recognizing lower case characters (knowing that the input is one). We |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
772 consider the digit classification task as the target task and we want to |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
773 evaluate whether training with the other tasks can help or hurt, and |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
774 whether the effect is different for MLPs versus SDAs. The goal is to find |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
775 out if deep learning can benefit more (or less) from multiple related tasks |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
776 (i.e. the multi-task setting) compared to a corresponding purely supervised |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
777 shallow learner. |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
778 |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
779 We use a single hidden layer MLP with 1000 hidden units, and a SDA |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
780 with 3 hidden layers (1000 hidden units per layer), pre-trained and |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
781 fine-tuned on NIST. |
437
479f2f518fc9
added Training with More Classes than Necessary
Guillaume Sicard <guitch21@gmail.com>
parents:
434
diff
changeset
|
782 |
460
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
783 Our results show that the MLP benefits marginally from the multi-task setting |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
784 in the case of digits (5\% relative improvement) but is actually hurt in the case |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
785 of characters (respectively 3\% and 4\% worse for lower and upper class characters). |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
786 On the other hand the SDA benefitted from the multi-task setting, with relative |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
787 error rate improvements of 27\%, 15\% and 13\% respectively for digits, |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
788 lower and upper case characters, as shown in Table~\ref{tab:multi-task}. |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
789 |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
790 \begin{table} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
791 \caption{Test error rates and relative change in error rates due to the use of |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
792 a multi-task setting, i.e., training on each task in isolation vs training |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
793 for all three tasks together, for MLPs vs SDAs. The SDA benefits much |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
794 more from the multi-task setting. All experiments on only on the |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
795 unperturbed NIST data, using validation error for model selection. |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
796 Relative improvement is 1 - single-task error / multi-task error.} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
797 \label{tab:multi-task} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
798 \begin{center} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
799 \begin{tabular}{|l|r|r|r|} \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
800 & single-task & multi-task & relative \\ |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
801 & setting & setting & improvement \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
802 MLP-digits & 3.77\% & 3.99\% & 5.6\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
803 MLP-lower & 17.4\% & 16.8\% & -4.1\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
804 MLP-upper & 7.84\% & 7.54\% & -3.6\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
805 SDA-digits & 2.6\% & 3.56\% & 27\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
806 SDA-lower & 12.3\% & 14.4\% & 15\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
807 SDA-upper & 5.93\% & 6.78\% & 13\% \\ \hline |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
808 \end{tabular} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
809 \end{center} |
fe292653a0f8
ajoute dernier tableau de resultats
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
458
diff
changeset
|
810 \end{table} |
437
479f2f518fc9
added Training with More Classes than Necessary
Guillaume Sicard <guitch21@gmail.com>
parents:
434
diff
changeset
|
811 |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
812 \section{Conclusions} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
813 |
407
fe2e2964e7a3
description des transformations en cours ajout d un fichier special.bib pour des references specifiques
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
393
diff
changeset
|
814 \bibliography{strings,ml,aigaion,specials} |
379
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
815 \bibliographystyle{mlapa} |
a21a174c1c18
added writeup skeleton
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
816 |
407
fe2e2964e7a3
description des transformations en cours ajout d un fichier special.bib pour des references specifiques
Xavier Glorot <glorotxa@iro.umontreal.ca>
parents:
393
diff
changeset
|
817 \end{document} |