comparison writeup/nips2010_submission.tex @ 555:b6dfba0a110c

ameliorer l'aspect visuel, Myriam
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Thu, 03 Jun 2010 08:09:35 -0400
parents e95395f51d72
children 17d16700e0c8 143a1467f157
comparison
equal deleted inserted replaced
554:e95395f51d72 555:b6dfba0a110c
1 \documentclass{article} % For LaTeX2e 1 \documentclass{article} % For LaTeX2e
2 \usepackage{nips10submit_e,times} 2 \usepackage{nips10submit_e,times}
3 3 \usepackage{wrapfig}
4 \usepackage{amsthm,amsmath,bbm} 4 \usepackage{amsthm,amsmath,bbm}
5 \usepackage[psamsfonts]{amssymb} 5 \usepackage[psamsfonts]{amssymb}
6 \usepackage{algorithm,algorithmic} 6 \usepackage{algorithm,algorithmic}
7 \usepackage[utf8]{inputenc} 7 \usepackage[utf8]{inputenc}
8 \usepackage{graphicx,subfigure} 8 \usepackage{graphicx,subfigure}
20 20
21 \vspace*{-2mm} 21 \vspace*{-2mm}
22 \begin{abstract} 22 \begin{abstract}
23 Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. 23 Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set.
24 \end{abstract} 24 \end{abstract}
25 \vspace*{-2mm} 25 \vspace*{-3mm}
26 26
27 \section{Introduction} 27 \section{Introduction}
28 \vspace*{-1mm} 28 \vspace*{-1mm}
29 29
30 Deep Learning has emerged as a promising new area of research in 30 Deep Learning has emerged as a promising new area of research in
75 but more needs to be done to explore the impact 75 but more needs to be done to explore the impact
76 of {\em out-of-distribution} examples and of the multi-task setting 76 of {\em out-of-distribution} examples and of the multi-task setting
77 (one exception is~\citep{CollobertR2008}, which uses very different kinds 77 (one exception is~\citep{CollobertR2008}, which uses very different kinds
78 of learning algorithms). In particular the {\em relative 78 of learning algorithms). In particular the {\em relative
79 advantage} of deep learning for these settings has not been evaluated. 79 advantage} of deep learning for these settings has not been evaluated.
80 The hypothesis explored here is that a deep hierarchy of features 80 The hypothesis discussed in the conclusion is that a deep hierarchy of features
81 may be better able to provide sharing of statistical strength 81 may be better able to provide sharing of statistical strength
82 between different regions in input space or different tasks, 82 between different regions in input space or different tasks.
83 as discussed in the conclusion. 83 %
84
85 In this paper we ask the following questions: 84 In this paper we ask the following questions:
86 85
87 %\begin{enumerate} 86 %\begin{enumerate}
88 $\bullet$ %\item 87 $\bullet$ %\item
89 Do the good results previously obtained with deep architectures on the 88 Do the good results previously obtained with deep architectures on the
115 \vspace*{-1mm} 114 \vspace*{-1mm}
116 \section{Perturbation and Transformation of Character Images} 115 \section{Perturbation and Transformation of Character Images}
117 \label{s:perturbations} 116 \label{s:perturbations}
118 \vspace*{-1mm} 117 \vspace*{-1mm}
119 118
120 \begin{minipage}[b]{0.14\linewidth} 119 \begin{wrapfigure}[8]{l}{0.15\textwidth}
121 \centering 120 %\begin{minipage}[b]{0.14\linewidth}
122 \includegraphics[scale=.45]{images/Original.PNG} 121 \vspace*{-5mm}
123 \label{fig:Original} 122 \begin{center}
124 \vspace{1.2cm} 123 \includegraphics[scale=.4]{images/Original.PNG}\\
125 \end{minipage}% 124 {\bf Original}
126 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 125 \end{center}
127 {\bf Original.} 126 \end{wrapfigure}
127 %\vspace{0.7cm}
128 %\end{minipage}%
129 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
128 This section describes the different transformations we used to stochastically 130 This section describes the different transformations we used to stochastically
129 transform source images such as the one on the left 131 transform source images such as the one on the left
130 in order to obtain data from a larger distribution which 132 in order to obtain data from a larger distribution which
131 covers a domain substantially larger than the clean characters distribution from 133 covers a domain substantially larger than the clean characters distribution from
132 which we start. Although character transformations have been used before to 134 which we start.
135 Although character transformations have been used before to
133 improve character recognizers, this effort is on a large scale both 136 improve character recognizers, this effort is on a large scale both
134 in number of classes and in the complexity of the transformations, hence 137 in number of classes and in the complexity of the transformations, hence
135 in the complexity of the learning task. 138 in the complexity of the learning task.
136 More details can 139 More details can
137 be found in this technical report~\citep{ift6266-tr-anonymous}. 140 be found in this technical report~\citep{ift6266-tr-anonymous}.
140 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the 143 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the
141 amount of deformation or noise introduced. 144 amount of deformation or noise introduced.
142 There are two main parts in the pipeline. The first one, 145 There are two main parts in the pipeline. The first one,
143 from slant to pinch below, performs transformations. The second 146 from slant to pinch below, performs transformations. The second
144 part, from blur to contrast, adds different kinds of noise. 147 part, from blur to contrast, adds different kinds of noise.
145 \end{minipage} 148 %\end{minipage}
146 149
147 {\large\bf Transformations} 150 \vspace*{1mm}
148 151 %\subsection{Transformations}
149 152 {\large\bf 2.1 Transformations}
150 \begin{minipage}[b]{0.14\linewidth} 153 \vspace*{1mm}
151 \centering 154
152 \includegraphics[scale=.45]{images/Slant_only.PNG} 155
153 \label{fig:Slant} 156 \begin{wrapfigure}[7]{l}{0.15\textwidth}
154 \end{minipage}% 157 %\begin{minipage}[b]{0.14\linewidth}
155 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
156 %\centering 158 %\centering
157 {\bf Slant.} 159 \begin{center}
158 Each row of the image is shifted 160 \vspace*{-5mm}
159 proportionally to its height: $shift = round(slant \times height)$. 161 \includegraphics[scale=.4]{images/Thick_only.PNG}\\
160 $slant \sim U[-complexity,complexity]$. 162 {\bf Thickness}
161 \vspace{1.2cm} 163 \end{center}
162 \end{minipage} 164 %\vspace{.6cm}
163 165 %\end{minipage}%
164 166 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
165 \begin{minipage}[b]{0.14\linewidth} 167 \end{wrapfigure}
166 \centering
167 \includegraphics[scale=.45]{images/Thick_only.PNG}
168 \label{fig:Thick}
169 \vspace{.9cm}
170 \end{minipage}%
171 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
172 {\bf Thickness.}
173 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} 168 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
174 are applied. The neighborhood of each pixel is multiplied 169 are applied. The neighborhood of each pixel is multiplied
175 element-wise with a {\em structuring element} matrix. 170 element-wise with a {\em structuring element} matrix.
176 The pixel value is replaced by the maximum or the minimum of the resulting 171 The pixel value is replaced by the maximum or the minimum of the resulting
177 matrix, respectively for dilation or erosion. Ten different structural elements with 172 matrix, respectively for dilation or erosion. Ten different structural elements with
179 randomly sample the operator type (dilation or erosion) with equal probability and one structural 174 randomly sample the operator type (dilation or erosion) with equal probability and one structural
180 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements 175 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
181 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). 176 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
182 A neutral element (no transformation) 177 A neutral element (no transformation)
183 is always present in the set. is applied. 178 is always present in the set. is applied.
184 \vspace{.4cm} 179 %\vspace{.4cm}
185 \end{minipage} 180 %\end{minipage}
186 \vspace{-.7cm} 181 %\vspace{-.7cm}
187
188 182
189 \begin{minipage}[b]{0.14\linewidth} 183 \begin{minipage}[b]{0.14\linewidth}
190 \centering 184 \centering
191 \includegraphics[scale=.45]{images/Affine_only.PNG} 185 \includegraphics[scale=.4]{images/Slant_only.PNG}\\
192 \label{fig:Affine} 186 {\bf Slant}
193 \end{minipage}% 187 \end{minipage}%
194 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 188 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth}
195 {\bf Affine Transformations.} 189 %\centering
190 %\vspace*{-15mm}
191 Each row of the image is shifted
192 proportionally to its height: $shift = round(slant \times height)$.
193 $slant \sim U[-complexity,complexity]$.
194 \vspace{1.5cm}
195 \end{minipage}
196 %\vspace*{-4mm}
197
198 %\begin{minipage}[b]{0.14\linewidth}
199 %\centering
200 \begin{wrapfigure}[8]{l}{0.15\textwidth}
201 \vspace*{-6mm}
202 \begin{center}
203 \includegraphics[scale=.4]{images/Affine_only.PNG}\\
204 {\bf Affine}
205 \end{center}
206 \end{wrapfigure}
207 %\end{minipage}%
208 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
196 A $2 \times 3$ affine transform matrix (with 209 A $2 \times 3$ affine transform matrix (with
197 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. 210 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$.
198 Output pixel $(x,y)$ takes the value of input pixel 211 Output pixel $(x,y)$ takes the value of input pixel
199 nearest to $(ax+by+c,dx+ey+f)$, 212 nearest to $(ax+by+c,dx+ey+f)$,
200 producing scaling, translation, rotation and shearing. 213 producing scaling, translation, rotation and shearing.
201 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to 214 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to
202 forbid large rotations (not to confuse classes) but to give good 215 forbid large rotations (not to confuse classes) but to give good
203 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times 216 variability of the transformation: $a$ and $d$ $\sim U[1-3
204 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 217 complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\,
205 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times 218 complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \,
206 complexity]$. 219 complexity]$.\\
207 \end{minipage} 220 %\end{minipage}
208 221
209 \begin{minipage}[b]{0.14\linewidth} 222 \vspace*{-4.5mm}
210 \centering 223
211 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} 224 \begin{minipage}[t]{\linewidth}
212 \label{fig:Elastic} 225 \begin{wrapfigure}[7]{l}{0.15\textwidth}
213 \end{minipage}% 226 %\hspace*{-8mm}\begin{minipage}[b]{0.25\linewidth}
214 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 227 %\centering
215 {\bf Local Elastic Deformations.} 228 \begin{center}
216 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, 229 \vspace*{-4mm}
230 \includegraphics[scale=.4]{images/Localelasticdistorsions_only.PNG}\\
231 {\bf Local Elastic}
232 \end{center}
233 \end{wrapfigure}
234 %\end{minipage}%
235 %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth}
236 %\vspace*{-20mm}
237 This local elastic deformation
238 filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
217 which provides more details. 239 which provides more details.
218 The intensity of the displacement fields is given by 240 The intensity of the displacement fields is given by
219 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are 241 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are
220 convolved with a Gaussian 2D kernel (resulting in a blur) of 242 convolved with a Gaussian 2D kernel (resulting in a blur) of
221 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. 243 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
222 \vspace{.4cm} 244 %\vspace{.9cm}
223 \end{minipage} 245 \end{minipage}
224 \vspace{-.7cm} 246
225 247 \vspace*{5mm}
226 \begin{minipage}[b]{0.14\linewidth} 248
227 \centering 249 %\begin{minipage}[b]{0.14\linewidth}
228 \includegraphics[scale=.45]{images/Pinch_only.PNG} 250 %\centering
229 \label{fig:Pinch} 251 \begin{wrapfigure}[7]{l}{0.15\textwidth}
230 \vspace{.6cm} 252 \vspace*{-5mm}
231 \end{minipage}% 253 \begin{center}
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 254 \includegraphics[scale=.4]{images/Pinch_only.PNG}\\
233 {\bf Pinch.} 255 {\bf Pinch}
234 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. 256 \end{center}
257 \end{wrapfigure}
258 %\vspace{.6cm}
259 %\end{minipage}%
260 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
261 This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0.
235 A pinch is ``similar to projecting the image onto an elastic 262 A pinch is ``similar to projecting the image onto an elastic
236 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). 263 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
237 For a square input image, this is akin to drawing a circle of 264 For a square input image, draw a radius-$r$ disk
238 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 265 around $C$. Any pixel $P$ belonging to
239 that disk (region inside circle) will have its value recalculated by taking 266 that disk has its value replaced by
240 the value of another ``source'' pixel in the original image. The position of 267 the value of a ``source'' pixel in the original image,
241 that source pixel is found on the line that goes through $C$ and $P$, but 268 on the line that goes through $C$ and $P$, but
242 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ 269 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
243 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
244 d_1$, where $pinch$ is a parameter to the filter. 270 d_1$, where $pinch$ is a parameter to the filter.
245 The actual value is given by bilinear interpolation considering the pixels 271 The actual value is given by bilinear interpolation considering the pixels
246 around the (non-integer) source position thus found. 272 around the (non-integer) source position thus found.
247 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. 273 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
248 %\vspace{1.5cm} 274 %\vspace{1.5cm}
249 \end{minipage} 275 %\end{minipage}
250 276
251 \vspace{.1cm} 277 \vspace{2mm}
252 278
253 {\large\bf Injecting Noise} 279 {\large\bf 2.2 Injecting Noise}
254 280 %\subsection{Injecting Noise}
255 \vspace*{-.2cm} 281 \vspace{2mm}
256 \begin{minipage}[b]{0.14\linewidth} 282
283 %\vspace*{-.2cm}
284 \begin{minipage}[t]{0.14\linewidth}
257 \centering 285 \centering
258 \includegraphics[scale=.45]{images/Motionblur_only.PNG} 286 \vspace*{-2mm}
259 \label{fig:Original} 287 \includegraphics[scale=.4]{images/Motionblur_only.PNG}\\
288 {\bf Motion Blur}
260 \end{minipage}% 289 \end{minipage}%
261 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 290 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
262 {\bf Motion Blur.} 291 %\vspace*{.5mm}
263 This is GIMP's ``linear motion blur'' 292 This is GIMP's ``linear motion blur''
264 with parameters $length$ and $angle$. The value of 293 with parameters $length$ and $angle$. The value of
265 a pixel in the final image is approximately the mean value of the first $length$ pixels 294 a pixel in the final image is approximately the mean of the first $length$ pixels
266 found by moving in the $angle$ direction. 295 found by moving in the $angle$ direction,
267 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. 296 $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
268 \vspace{.7cm} 297 \vspace{5mm}
269 \end{minipage} 298 \end{minipage}
270 299
271 \vspace*{-5mm} 300 \vspace*{1mm}
272 301
273 \begin{minipage}[b]{0.14\linewidth} 302 \begin{minipage}[t]{0.14\linewidth}
274 \centering 303 \centering
275 \includegraphics[scale=.45]{images/occlusion_only.PNG} 304 \includegraphics[scale=.4]{images/occlusion_only.PNG}\\
276 \label{fig:Original} 305 {\bf Occlusion}
306 %\vspace{.5cm}
277 \end{minipage}% 307 \end{minipage}%
278 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 308 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
279 {\bf Occlusion.} 309 \vspace*{-18mm}
280 Selects a random rectangle from an {\em occluder} character 310 Selects a random rectangle from an {\em occluder} character
281 image and places it over the original {\em occluded} 311 image and places it over the original {\em occluded}
282 image. Pixels are combined by taking the max(occluder,occluded), 312 image. Pixels are combined by taking the max(occluder,occluded),
283 closer to black. The rectangle corners 313 closer to black. The rectangle corners
284 are sampled so that larger complexity gives larger rectangles. 314 are sampled so that larger complexity gives larger rectangles.
285 The destination position in the occluded image are also sampled 315 The destination position in the occluded image are also sampled
286 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). 316 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
287 This filter is skipped with probability 60\%. 317 This filter is skipped with probability 60\%.
288 \vspace{.4cm} 318 %\vspace{7mm}
289 \end{minipage} 319 \end{minipage}
290 320
291 \vspace*{-5mm} 321 \vspace*{1mm}
292 \begin{minipage}[b]{0.14\linewidth} 322
293 \centering 323 \begin{wrapfigure}[8]{l}{0.15\textwidth}
294 \includegraphics[scale=.45]{images/Permutpixel_only.PNG} 324 \vspace*{-6mm}
295 \label{fig:Original} 325 \begin{center}
296 \end{minipage}% 326 %\begin{minipage}[t]{0.14\linewidth}
297 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 327 %\centering
298 {\bf Pixel Permutation.} 328 \includegraphics[scale=.4]{images/Bruitgauss_only.PNG}\\
299 This filter permutes neighbouring pixels. It first selects 329 {\bf Gaussian Smoothing}
300 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then 330 \end{center}
301 sequentially exchanged with one other in as $V4$ neighbourhood. 331 \end{wrapfigure}
302 This filter is skipped with probability 80\%. 332 %\vspace{.5cm}
303 \vspace{.8cm} 333 %\end{minipage}%
304 \end{minipage} 334 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
305
306
307 \begin{minipage}[b]{0.14\linewidth}
308 \centering
309 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG}
310 \label{fig:Original}
311 \end{minipage}%
312 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
313 {\bf Gaussian Noise.}
314 This filter simply adds, to each pixel of the image independently, a
315 noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
316 This filter is skipped with probability 70\%.
317 \vspace{1.1cm}
318 \end{minipage}
319 \vspace{-.7cm}
320
321 \begin{minipage}[b]{0.14\linewidth}
322 \centering
323 \includegraphics[scale=.45]{images/background_other_only.png}
324 \label{fig:Original}
325 \end{minipage}%
326 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
327 {\bf Background Images.}
328 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
329 background behind the letter, from a randomly chosen natural image,
330 with contrast adjustments depending on $complexity$, to preserve
331 more or less of the original character image.
332 \vspace{.8cm}
333 \end{minipage}
334 \vspace{-.7cm}
335
336 \begin{minipage}[b]{0.14\linewidth}
337 \centering
338 \includegraphics[scale=.45]{images/Poivresel_only.PNG}
339 \label{fig:Original}
340 \end{minipage}%
341 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
342 {\bf Salt and Pepper Noise.}
343 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
344 The number of selected pixels is $0.2 \times complexity$.
345 This filter is skipped with probability 75\%.
346 \vspace{.9cm}
347 \end{minipage}
348 \vspace{-.7cm}
349
350 \begin{minipage}[b]{0.14\linewidth}
351 \centering
352 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG}
353 \label{fig:Original}
354 \vspace{.5cm}
355 \end{minipage}%
356 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
357 {\bf Spatially Gaussian Smoothing.}
358 Different regions of the image are spatially smoothed by convolving 335 Different regions of the image are spatially smoothed by convolving
359 the image with a symmetric Gaussian kernel of 336 the image with a symmetric Gaussian kernel of
360 size and variance chosen uniformly in the ranges $[12,12 + 20 \times 337 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
361 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized 338 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
362 between $0$ and $1$. We also create a symmetric weighted averaging window, of the 339 between $0$ and $1$. We also create a symmetric weighted averaging window, of the
366 initialize to zero a mask matrix of the image size. For each selected pixel 343 initialize to zero a mask matrix of the image size. For each selected pixel
367 we add to the mask the averaging window centered to it. The final image is 344 we add to the mask the averaging window centered to it. The final image is
368 computed from the following element-wise operation: $\frac{image + filtered 345 computed from the following element-wise operation: $\frac{image + filtered
369 image \times mask}{mask+1}$. 346 image \times mask}{mask+1}$.
370 This filter is skipped with probability 75\%. 347 This filter is skipped with probability 75\%.
371 \end{minipage} 348 %\end{minipage}
372 \vspace{-.7cm} 349
373 350 \newpage
374 \begin{minipage}[b]{0.14\linewidth} 351
352 \vspace*{-9mm}
353
354 %\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth}
355 %\centering
356 \begin{minipage}[t]{\linewidth}
357 \begin{wrapfigure}[7]{l}{0.15\textwidth}
358 \vspace*{-5mm}
359 \begin{center}
360 \includegraphics[scale=.4]{images/Permutpixel_only.PNG}\\
361 {\small\bf Permute Pixels}
362 \end{center}
363 \end{wrapfigure}
364 %\end{minipage}%
365 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth}
366 %\vspace*{-20mm}
367 This filter permutes neighbouring pixels. It first selects
368 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
369 sequentially exchanged with one other in as $V4$ neighbourhood.
370 This filter is skipped with probability 80\%.\\
371 \vspace*{1mm}
372 \end{minipage}
373
374 \vspace{-1mm}
375
376 \begin{minipage}[t]{\linewidth}
377 \begin{wrapfigure}[7]{l}{0.15\textwidth}
378 %\vspace*{-3mm}
379 \begin{center}
380 %\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth}
381 %\centering
382 \vspace*{-5mm}
383 \includegraphics[scale=.4]{images/Distorsiongauss_only.PNG}\\
384 {\small \bf Gauss. Noise}
385 \end{center}
386 \end{wrapfigure}
387 %\end{minipage}%
388 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
389 \vspace*{12mm}
390 This filter simply adds, to each pixel of the image independently, a
391 noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
392 This filter is skipped with probability 70\%.
393 %\vspace{1.1cm}
394 \end{minipage}
395
396 \vspace*{1.5cm}
397
398 \begin{minipage}[t]{\linewidth}
399 \begin{minipage}[t]{0.14\linewidth}
375 \centering 400 \centering
376 \includegraphics[scale=.45]{images/Rature_only.PNG} 401 \includegraphics[scale=.4]{images/background_other_only.png}\\
377 \label{fig:Original} 402 {\small \bf Bg Image}
378 \end{minipage}% 403 \end{minipage}%
379 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 404 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
380 \vspace{.4cm} 405 \vspace*{-18mm}
381 {\bf Scratches.} 406 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
407 background image behind the letter, from a randomly chosen natural image,
408 with contrast adjustments depending on $complexity$, to preserve
409 more or less of the original character image.
410 %\vspace{.8cm}
411 \end{minipage}
412 \end{minipage}
413 %\vspace{-.7cm}
414
415 \begin{minipage}[t]{0.14\linewidth}
416 \centering
417 \includegraphics[scale=.4]{images/Poivresel_only.PNG}\\
418 {\small \bf Salt \& Pepper}
419 \end{minipage}%
420 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
421 \vspace*{-18mm}
422 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
423 The number of selected pixels is $0.2 \times complexity$.
424 This filter is skipped with probability 75\%.
425 %\vspace{.9cm}
426 \end{minipage}
427 %\vspace{-.7cm}
428
429 \vspace{1mm}
430
431 \begin{minipage}[t]{\linewidth}
432 \begin{wrapfigure}[7]{l}{0.14\textwidth}
433 %\begin{minipage}[t]{0.14\linewidth}
434 %\centering
435 \begin{center}
436 \vspace*{-4mm}
437 \hspace*{-1mm}\includegraphics[scale=.4]{images/Rature_only.PNG}\\
438 {\bf Scratches}
439 %\end{minipage}%
440 \end{center}
441 \end{wrapfigure}
442 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
443 %\vspace{.4cm}
382 The scratches module places line-like white patches on the image. The 444 The scratches module places line-like white patches on the image. The
383 lines are heavily transformed images of the digit ``1'' (one), chosen 445 lines are heavily transformed images of the digit ``1'' (one), chosen
384 at random among 500 such 1 images, 446 at random among 500 such 1 images,
385 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times 447 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
386 complexity)^2$ (in degrees), using bi-cubic interpolation. 448 complexity)^2$ (in degrees), using bi-cubic interpolation.
388 are applied, reducing the width of the line 450 are applied, reducing the width of the line
389 by an amount controlled by $complexity$. 451 by an amount controlled by $complexity$.
390 This filter is skipped with probability 85\%. The probabilities 452 This filter is skipped with probability 85\%. The probabilities
391 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). 453 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
392 \end{minipage} 454 \end{minipage}
393 \vspace{-.7cm} 455
394 456 \vspace*{2mm}
395 \begin{minipage}[b]{0.14\linewidth} 457
458 \begin{minipage}[t]{0.20\linewidth}
396 \centering 459 \centering
397 \includegraphics[scale=.45]{images/Contrast_only.PNG} 460 \hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.PNG}\\
398 \label{fig:Original} 461 {\bf Grey \& Contrast}
399 \end{minipage}% 462 \end{minipage}%
400 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 463 \hspace{-4mm}\begin{minipage}[t]{0.82\linewidth}
401 {\bf Grey Level and Contrast Changes.} 464 \vspace*{-18mm}
402 This filter changes the contrast and may invert the image polarity (white 465 This filter changes the contrast by changing grey levels, and may invert the image polarity (white
403 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ 466 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
404 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 467 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
405 polarity is inverted with probability 50\%. 468 polarity is inverted with probability 50\%.
406 \vspace{.7cm} 469 %\vspace{.7cm}
407 \end{minipage} 470 \end{minipage}
408 \vspace{-.7cm} 471 \vspace{2mm}
409 472
410 473
411 \iffalse 474 \iffalse
412 \begin{figure}[ht] 475 \begin{figure}[ht]
413 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ 476 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\