Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 555:b6dfba0a110c
ameliorer l'aspect visuel, Myriam
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Thu, 03 Jun 2010 08:09:35 -0400 |
parents | e95395f51d72 |
children | 17d16700e0c8 143a1467f157 |
comparison
equal
deleted
inserted
replaced
554:e95395f51d72 | 555:b6dfba0a110c |
---|---|
1 \documentclass{article} % For LaTeX2e | 1 \documentclass{article} % For LaTeX2e |
2 \usepackage{nips10submit_e,times} | 2 \usepackage{nips10submit_e,times} |
3 | 3 \usepackage{wrapfig} |
4 \usepackage{amsthm,amsmath,bbm} | 4 \usepackage{amsthm,amsmath,bbm} |
5 \usepackage[psamsfonts]{amssymb} | 5 \usepackage[psamsfonts]{amssymb} |
6 \usepackage{algorithm,algorithmic} | 6 \usepackage{algorithm,algorithmic} |
7 \usepackage[utf8]{inputenc} | 7 \usepackage[utf8]{inputenc} |
8 \usepackage{graphicx,subfigure} | 8 \usepackage{graphicx,subfigure} |
20 | 20 |
21 \vspace*{-2mm} | 21 \vspace*{-2mm} |
22 \begin{abstract} | 22 \begin{abstract} |
23 Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. | 23 Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. |
24 \end{abstract} | 24 \end{abstract} |
25 \vspace*{-2mm} | 25 \vspace*{-3mm} |
26 | 26 |
27 \section{Introduction} | 27 \section{Introduction} |
28 \vspace*{-1mm} | 28 \vspace*{-1mm} |
29 | 29 |
30 Deep Learning has emerged as a promising new area of research in | 30 Deep Learning has emerged as a promising new area of research in |
75 but more needs to be done to explore the impact | 75 but more needs to be done to explore the impact |
76 of {\em out-of-distribution} examples and of the multi-task setting | 76 of {\em out-of-distribution} examples and of the multi-task setting |
77 (one exception is~\citep{CollobertR2008}, which uses very different kinds | 77 (one exception is~\citep{CollobertR2008}, which uses very different kinds |
78 of learning algorithms). In particular the {\em relative | 78 of learning algorithms). In particular the {\em relative |
79 advantage} of deep learning for these settings has not been evaluated. | 79 advantage} of deep learning for these settings has not been evaluated. |
80 The hypothesis explored here is that a deep hierarchy of features | 80 The hypothesis discussed in the conclusion is that a deep hierarchy of features |
81 may be better able to provide sharing of statistical strength | 81 may be better able to provide sharing of statistical strength |
82 between different regions in input space or different tasks, | 82 between different regions in input space or different tasks. |
83 as discussed in the conclusion. | 83 % |
84 | |
85 In this paper we ask the following questions: | 84 In this paper we ask the following questions: |
86 | 85 |
87 %\begin{enumerate} | 86 %\begin{enumerate} |
88 $\bullet$ %\item | 87 $\bullet$ %\item |
89 Do the good results previously obtained with deep architectures on the | 88 Do the good results previously obtained with deep architectures on the |
115 \vspace*{-1mm} | 114 \vspace*{-1mm} |
116 \section{Perturbation and Transformation of Character Images} | 115 \section{Perturbation and Transformation of Character Images} |
117 \label{s:perturbations} | 116 \label{s:perturbations} |
118 \vspace*{-1mm} | 117 \vspace*{-1mm} |
119 | 118 |
120 \begin{minipage}[b]{0.14\linewidth} | 119 \begin{wrapfigure}[8]{l}{0.15\textwidth} |
121 \centering | 120 %\begin{minipage}[b]{0.14\linewidth} |
122 \includegraphics[scale=.45]{images/Original.PNG} | 121 \vspace*{-5mm} |
123 \label{fig:Original} | 122 \begin{center} |
124 \vspace{1.2cm} | 123 \includegraphics[scale=.4]{images/Original.PNG}\\ |
125 \end{minipage}% | 124 {\bf Original} |
126 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 125 \end{center} |
127 {\bf Original.} | 126 \end{wrapfigure} |
127 %\vspace{0.7cm} | |
128 %\end{minipage}% | |
129 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
128 This section describes the different transformations we used to stochastically | 130 This section describes the different transformations we used to stochastically |
129 transform source images such as the one on the left | 131 transform source images such as the one on the left |
130 in order to obtain data from a larger distribution which | 132 in order to obtain data from a larger distribution which |
131 covers a domain substantially larger than the clean characters distribution from | 133 covers a domain substantially larger than the clean characters distribution from |
132 which we start. Although character transformations have been used before to | 134 which we start. |
135 Although character transformations have been used before to | |
133 improve character recognizers, this effort is on a large scale both | 136 improve character recognizers, this effort is on a large scale both |
134 in number of classes and in the complexity of the transformations, hence | 137 in number of classes and in the complexity of the transformations, hence |
135 in the complexity of the learning task. | 138 in the complexity of the learning task. |
136 More details can | 139 More details can |
137 be found in this technical report~\citep{ift6266-tr-anonymous}. | 140 be found in this technical report~\citep{ift6266-tr-anonymous}. |
140 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the | 143 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the |
141 amount of deformation or noise introduced. | 144 amount of deformation or noise introduced. |
142 There are two main parts in the pipeline. The first one, | 145 There are two main parts in the pipeline. The first one, |
143 from slant to pinch below, performs transformations. The second | 146 from slant to pinch below, performs transformations. The second |
144 part, from blur to contrast, adds different kinds of noise. | 147 part, from blur to contrast, adds different kinds of noise. |
145 \end{minipage} | 148 %\end{minipage} |
146 | 149 |
147 {\large\bf Transformations} | 150 \vspace*{1mm} |
148 | 151 %\subsection{Transformations} |
149 | 152 {\large\bf 2.1 Transformations} |
150 \begin{minipage}[b]{0.14\linewidth} | 153 \vspace*{1mm} |
151 \centering | 154 |
152 \includegraphics[scale=.45]{images/Slant_only.PNG} | 155 |
153 \label{fig:Slant} | 156 \begin{wrapfigure}[7]{l}{0.15\textwidth} |
154 \end{minipage}% | 157 %\begin{minipage}[b]{0.14\linewidth} |
155 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
156 %\centering | 158 %\centering |
157 {\bf Slant.} | 159 \begin{center} |
158 Each row of the image is shifted | 160 \vspace*{-5mm} |
159 proportionally to its height: $shift = round(slant \times height)$. | 161 \includegraphics[scale=.4]{images/Thick_only.PNG}\\ |
160 $slant \sim U[-complexity,complexity]$. | 162 {\bf Thickness} |
161 \vspace{1.2cm} | 163 \end{center} |
162 \end{minipage} | 164 %\vspace{.6cm} |
163 | 165 %\end{minipage}% |
164 | 166 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
165 \begin{minipage}[b]{0.14\linewidth} | 167 \end{wrapfigure} |
166 \centering | |
167 \includegraphics[scale=.45]{images/Thick_only.PNG} | |
168 \label{fig:Thick} | |
169 \vspace{.9cm} | |
170 \end{minipage}% | |
171 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
172 {\bf Thickness.} | |
173 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} | 168 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
174 are applied. The neighborhood of each pixel is multiplied | 169 are applied. The neighborhood of each pixel is multiplied |
175 element-wise with a {\em structuring element} matrix. | 170 element-wise with a {\em structuring element} matrix. |
176 The pixel value is replaced by the maximum or the minimum of the resulting | 171 The pixel value is replaced by the maximum or the minimum of the resulting |
177 matrix, respectively for dilation or erosion. Ten different structural elements with | 172 matrix, respectively for dilation or erosion. Ten different structural elements with |
179 randomly sample the operator type (dilation or erosion) with equal probability and one structural | 174 randomly sample the operator type (dilation or erosion) with equal probability and one structural |
180 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements | 175 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements |
181 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). | 176 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). |
182 A neutral element (no transformation) | 177 A neutral element (no transformation) |
183 is always present in the set. is applied. | 178 is always present in the set. is applied. |
184 \vspace{.4cm} | 179 %\vspace{.4cm} |
185 \end{minipage} | 180 %\end{minipage} |
186 \vspace{-.7cm} | 181 %\vspace{-.7cm} |
187 | |
188 | 182 |
189 \begin{minipage}[b]{0.14\linewidth} | 183 \begin{minipage}[b]{0.14\linewidth} |
190 \centering | 184 \centering |
191 \includegraphics[scale=.45]{images/Affine_only.PNG} | 185 \includegraphics[scale=.4]{images/Slant_only.PNG}\\ |
192 \label{fig:Affine} | 186 {\bf Slant} |
193 \end{minipage}% | 187 \end{minipage}% |
194 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 188 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} |
195 {\bf Affine Transformations.} | 189 %\centering |
190 %\vspace*{-15mm} | |
191 Each row of the image is shifted | |
192 proportionally to its height: $shift = round(slant \times height)$. | |
193 $slant \sim U[-complexity,complexity]$. | |
194 \vspace{1.5cm} | |
195 \end{minipage} | |
196 %\vspace*{-4mm} | |
197 | |
198 %\begin{minipage}[b]{0.14\linewidth} | |
199 %\centering | |
200 \begin{wrapfigure}[8]{l}{0.15\textwidth} | |
201 \vspace*{-6mm} | |
202 \begin{center} | |
203 \includegraphics[scale=.4]{images/Affine_only.PNG}\\ | |
204 {\bf Affine} | |
205 \end{center} | |
206 \end{wrapfigure} | |
207 %\end{minipage}% | |
208 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
196 A $2 \times 3$ affine transform matrix (with | 209 A $2 \times 3$ affine transform matrix (with |
197 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | 210 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. |
198 Output pixel $(x,y)$ takes the value of input pixel | 211 Output pixel $(x,y)$ takes the value of input pixel |
199 nearest to $(ax+by+c,dx+ey+f)$, | 212 nearest to $(ax+by+c,dx+ey+f)$, |
200 producing scaling, translation, rotation and shearing. | 213 producing scaling, translation, rotation and shearing. |
201 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to | 214 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to |
202 forbid large rotations (not to confuse classes) but to give good | 215 forbid large rotations (not to confuse classes) but to give good |
203 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times | 216 variability of the transformation: $a$ and $d$ $\sim U[1-3 |
204 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 | 217 complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\, |
205 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times | 218 complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \, |
206 complexity]$. | 219 complexity]$.\\ |
207 \end{minipage} | 220 %\end{minipage} |
208 | 221 |
209 \begin{minipage}[b]{0.14\linewidth} | 222 \vspace*{-4.5mm} |
210 \centering | 223 |
211 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} | 224 \begin{minipage}[t]{\linewidth} |
212 \label{fig:Elastic} | 225 \begin{wrapfigure}[7]{l}{0.15\textwidth} |
213 \end{minipage}% | 226 %\hspace*{-8mm}\begin{minipage}[b]{0.25\linewidth} |
214 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 227 %\centering |
215 {\bf Local Elastic Deformations.} | 228 \begin{center} |
216 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | 229 \vspace*{-4mm} |
230 \includegraphics[scale=.4]{images/Localelasticdistorsions_only.PNG}\\ | |
231 {\bf Local Elastic} | |
232 \end{center} | |
233 \end{wrapfigure} | |
234 %\end{minipage}% | |
235 %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth} | |
236 %\vspace*{-20mm} | |
237 This local elastic deformation | |
238 filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | |
217 which provides more details. | 239 which provides more details. |
218 The intensity of the displacement fields is given by | 240 The intensity of the displacement fields is given by |
219 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are | 241 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are |
220 convolved with a Gaussian 2D kernel (resulting in a blur) of | 242 convolved with a Gaussian 2D kernel (resulting in a blur) of |
221 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. | 243 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. |
222 \vspace{.4cm} | 244 %\vspace{.9cm} |
223 \end{minipage} | 245 \end{minipage} |
224 \vspace{-.7cm} | 246 |
225 | 247 \vspace*{5mm} |
226 \begin{minipage}[b]{0.14\linewidth} | 248 |
227 \centering | 249 %\begin{minipage}[b]{0.14\linewidth} |
228 \includegraphics[scale=.45]{images/Pinch_only.PNG} | 250 %\centering |
229 \label{fig:Pinch} | 251 \begin{wrapfigure}[7]{l}{0.15\textwidth} |
230 \vspace{.6cm} | 252 \vspace*{-5mm} |
231 \end{minipage}% | 253 \begin{center} |
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 254 \includegraphics[scale=.4]{images/Pinch_only.PNG}\\ |
233 {\bf Pinch.} | 255 {\bf Pinch} |
234 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. | 256 \end{center} |
257 \end{wrapfigure} | |
258 %\vspace{.6cm} | |
259 %\end{minipage}% | |
260 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
261 This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0. | |
235 A pinch is ``similar to projecting the image onto an elastic | 262 A pinch is ``similar to projecting the image onto an elastic |
236 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 263 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
237 For a square input image, this is akin to drawing a circle of | 264 For a square input image, draw a radius-$r$ disk |
238 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 265 around $C$. Any pixel $P$ belonging to |
239 that disk (region inside circle) will have its value recalculated by taking | 266 that disk has its value replaced by |
240 the value of another ``source'' pixel in the original image. The position of | 267 the value of a ``source'' pixel in the original image, |
241 that source pixel is found on the line that goes through $C$ and $P$, but | 268 on the line that goes through $C$ and $P$, but |
242 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ | 269 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times |
243 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | |
244 d_1$, where $pinch$ is a parameter to the filter. | 270 d_1$, where $pinch$ is a parameter to the filter. |
245 The actual value is given by bilinear interpolation considering the pixels | 271 The actual value is given by bilinear interpolation considering the pixels |
246 around the (non-integer) source position thus found. | 272 around the (non-integer) source position thus found. |
247 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | 273 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. |
248 %\vspace{1.5cm} | 274 %\vspace{1.5cm} |
249 \end{minipage} | 275 %\end{minipage} |
250 | 276 |
251 \vspace{.1cm} | 277 \vspace{2mm} |
252 | 278 |
253 {\large\bf Injecting Noise} | 279 {\large\bf 2.2 Injecting Noise} |
254 | 280 %\subsection{Injecting Noise} |
255 \vspace*{-.2cm} | 281 \vspace{2mm} |
256 \begin{minipage}[b]{0.14\linewidth} | 282 |
283 %\vspace*{-.2cm} | |
284 \begin{minipage}[t]{0.14\linewidth} | |
257 \centering | 285 \centering |
258 \includegraphics[scale=.45]{images/Motionblur_only.PNG} | 286 \vspace*{-2mm} |
259 \label{fig:Original} | 287 \includegraphics[scale=.4]{images/Motionblur_only.PNG}\\ |
288 {\bf Motion Blur} | |
260 \end{minipage}% | 289 \end{minipage}% |
261 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 290 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
262 {\bf Motion Blur.} | 291 %\vspace*{.5mm} |
263 This is GIMP's ``linear motion blur'' | 292 This is GIMP's ``linear motion blur'' |
264 with parameters $length$ and $angle$. The value of | 293 with parameters $length$ and $angle$. The value of |
265 a pixel in the final image is approximately the mean value of the first $length$ pixels | 294 a pixel in the final image is approximately the mean of the first $length$ pixels |
266 found by moving in the $angle$ direction. | 295 found by moving in the $angle$ direction, |
267 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | 296 $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. |
268 \vspace{.7cm} | 297 \vspace{5mm} |
269 \end{minipage} | 298 \end{minipage} |
270 | 299 |
271 \vspace*{-5mm} | 300 \vspace*{1mm} |
272 | 301 |
273 \begin{minipage}[b]{0.14\linewidth} | 302 \begin{minipage}[t]{0.14\linewidth} |
274 \centering | 303 \centering |
275 \includegraphics[scale=.45]{images/occlusion_only.PNG} | 304 \includegraphics[scale=.4]{images/occlusion_only.PNG}\\ |
276 \label{fig:Original} | 305 {\bf Occlusion} |
306 %\vspace{.5cm} | |
277 \end{minipage}% | 307 \end{minipage}% |
278 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 308 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
279 {\bf Occlusion.} | 309 \vspace*{-18mm} |
280 Selects a random rectangle from an {\em occluder} character | 310 Selects a random rectangle from an {\em occluder} character |
281 image and places it over the original {\em occluded} | 311 image and places it over the original {\em occluded} |
282 image. Pixels are combined by taking the max(occluder,occluded), | 312 image. Pixels are combined by taking the max(occluder,occluded), |
283 closer to black. The rectangle corners | 313 closer to black. The rectangle corners |
284 are sampled so that larger complexity gives larger rectangles. | 314 are sampled so that larger complexity gives larger rectangles. |
285 The destination position in the occluded image are also sampled | 315 The destination position in the occluded image are also sampled |
286 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). | 316 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). |
287 This filter is skipped with probability 60\%. | 317 This filter is skipped with probability 60\%. |
288 \vspace{.4cm} | 318 %\vspace{7mm} |
289 \end{minipage} | 319 \end{minipage} |
290 | 320 |
291 \vspace*{-5mm} | 321 \vspace*{1mm} |
292 \begin{minipage}[b]{0.14\linewidth} | 322 |
293 \centering | 323 \begin{wrapfigure}[8]{l}{0.15\textwidth} |
294 \includegraphics[scale=.45]{images/Permutpixel_only.PNG} | 324 \vspace*{-6mm} |
295 \label{fig:Original} | 325 \begin{center} |
296 \end{minipage}% | 326 %\begin{minipage}[t]{0.14\linewidth} |
297 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 327 %\centering |
298 {\bf Pixel Permutation.} | 328 \includegraphics[scale=.4]{images/Bruitgauss_only.PNG}\\ |
299 This filter permutes neighbouring pixels. It first selects | 329 {\bf Gaussian Smoothing} |
300 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | 330 \end{center} |
301 sequentially exchanged with one other in as $V4$ neighbourhood. | 331 \end{wrapfigure} |
302 This filter is skipped with probability 80\%. | 332 %\vspace{.5cm} |
303 \vspace{.8cm} | 333 %\end{minipage}% |
304 \end{minipage} | 334 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} |
305 | |
306 | |
307 \begin{minipage}[b]{0.14\linewidth} | |
308 \centering | |
309 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG} | |
310 \label{fig:Original} | |
311 \end{minipage}% | |
312 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
313 {\bf Gaussian Noise.} | |
314 This filter simply adds, to each pixel of the image independently, a | |
315 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. | |
316 This filter is skipped with probability 70\%. | |
317 \vspace{1.1cm} | |
318 \end{minipage} | |
319 \vspace{-.7cm} | |
320 | |
321 \begin{minipage}[b]{0.14\linewidth} | |
322 \centering | |
323 \includegraphics[scale=.45]{images/background_other_only.png} | |
324 \label{fig:Original} | |
325 \end{minipage}% | |
326 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
327 {\bf Background Images.} | |
328 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | |
329 background behind the letter, from a randomly chosen natural image, | |
330 with contrast adjustments depending on $complexity$, to preserve | |
331 more or less of the original character image. | |
332 \vspace{.8cm} | |
333 \end{minipage} | |
334 \vspace{-.7cm} | |
335 | |
336 \begin{minipage}[b]{0.14\linewidth} | |
337 \centering | |
338 \includegraphics[scale=.45]{images/Poivresel_only.PNG} | |
339 \label{fig:Original} | |
340 \end{minipage}% | |
341 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
342 {\bf Salt and Pepper Noise.} | |
343 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | |
344 The number of selected pixels is $0.2 \times complexity$. | |
345 This filter is skipped with probability 75\%. | |
346 \vspace{.9cm} | |
347 \end{minipage} | |
348 \vspace{-.7cm} | |
349 | |
350 \begin{minipage}[b]{0.14\linewidth} | |
351 \centering | |
352 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG} | |
353 \label{fig:Original} | |
354 \vspace{.5cm} | |
355 \end{minipage}% | |
356 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
357 {\bf Spatially Gaussian Smoothing.} | |
358 Different regions of the image are spatially smoothed by convolving | 335 Different regions of the image are spatially smoothed by convolving |
359 the image with a symmetric Gaussian kernel of | 336 the image with a symmetric Gaussian kernel of |
360 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 337 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
361 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 338 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
362 between $0$ and $1$. We also create a symmetric weighted averaging window, of the | 339 between $0$ and $1$. We also create a symmetric weighted averaging window, of the |
366 initialize to zero a mask matrix of the image size. For each selected pixel | 343 initialize to zero a mask matrix of the image size. For each selected pixel |
367 we add to the mask the averaging window centered to it. The final image is | 344 we add to the mask the averaging window centered to it. The final image is |
368 computed from the following element-wise operation: $\frac{image + filtered | 345 computed from the following element-wise operation: $\frac{image + filtered |
369 image \times mask}{mask+1}$. | 346 image \times mask}{mask+1}$. |
370 This filter is skipped with probability 75\%. | 347 This filter is skipped with probability 75\%. |
371 \end{minipage} | 348 %\end{minipage} |
372 \vspace{-.7cm} | 349 |
373 | 350 \newpage |
374 \begin{minipage}[b]{0.14\linewidth} | 351 |
352 \vspace*{-9mm} | |
353 | |
354 %\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth} | |
355 %\centering | |
356 \begin{minipage}[t]{\linewidth} | |
357 \begin{wrapfigure}[7]{l}{0.15\textwidth} | |
358 \vspace*{-5mm} | |
359 \begin{center} | |
360 \includegraphics[scale=.4]{images/Permutpixel_only.PNG}\\ | |
361 {\small\bf Permute Pixels} | |
362 \end{center} | |
363 \end{wrapfigure} | |
364 %\end{minipage}% | |
365 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} | |
366 %\vspace*{-20mm} | |
367 This filter permutes neighbouring pixels. It first selects | |
368 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | |
369 sequentially exchanged with one other in as $V4$ neighbourhood. | |
370 This filter is skipped with probability 80\%.\\ | |
371 \vspace*{1mm} | |
372 \end{minipage} | |
373 | |
374 \vspace{-1mm} | |
375 | |
376 \begin{minipage}[t]{\linewidth} | |
377 \begin{wrapfigure}[7]{l}{0.15\textwidth} | |
378 %\vspace*{-3mm} | |
379 \begin{center} | |
380 %\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth} | |
381 %\centering | |
382 \vspace*{-5mm} | |
383 \includegraphics[scale=.4]{images/Distorsiongauss_only.PNG}\\ | |
384 {\small \bf Gauss. Noise} | |
385 \end{center} | |
386 \end{wrapfigure} | |
387 %\end{minipage}% | |
388 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | |
389 \vspace*{12mm} | |
390 This filter simply adds, to each pixel of the image independently, a | |
391 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. | |
392 This filter is skipped with probability 70\%. | |
393 %\vspace{1.1cm} | |
394 \end{minipage} | |
395 | |
396 \vspace*{1.5cm} | |
397 | |
398 \begin{minipage}[t]{\linewidth} | |
399 \begin{minipage}[t]{0.14\linewidth} | |
375 \centering | 400 \centering |
376 \includegraphics[scale=.45]{images/Rature_only.PNG} | 401 \includegraphics[scale=.4]{images/background_other_only.png}\\ |
377 \label{fig:Original} | 402 {\small \bf Bg Image} |
378 \end{minipage}% | 403 \end{minipage}% |
379 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 404 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
380 \vspace{.4cm} | 405 \vspace*{-18mm} |
381 {\bf Scratches.} | 406 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
407 background image behind the letter, from a randomly chosen natural image, | |
408 with contrast adjustments depending on $complexity$, to preserve | |
409 more or less of the original character image. | |
410 %\vspace{.8cm} | |
411 \end{minipage} | |
412 \end{minipage} | |
413 %\vspace{-.7cm} | |
414 | |
415 \begin{minipage}[t]{0.14\linewidth} | |
416 \centering | |
417 \includegraphics[scale=.4]{images/Poivresel_only.PNG}\\ | |
418 {\small \bf Salt \& Pepper} | |
419 \end{minipage}% | |
420 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | |
421 \vspace*{-18mm} | |
422 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | |
423 The number of selected pixels is $0.2 \times complexity$. | |
424 This filter is skipped with probability 75\%. | |
425 %\vspace{.9cm} | |
426 \end{minipage} | |
427 %\vspace{-.7cm} | |
428 | |
429 \vspace{1mm} | |
430 | |
431 \begin{minipage}[t]{\linewidth} | |
432 \begin{wrapfigure}[7]{l}{0.14\textwidth} | |
433 %\begin{minipage}[t]{0.14\linewidth} | |
434 %\centering | |
435 \begin{center} | |
436 \vspace*{-4mm} | |
437 \hspace*{-1mm}\includegraphics[scale=.4]{images/Rature_only.PNG}\\ | |
438 {\bf Scratches} | |
439 %\end{minipage}% | |
440 \end{center} | |
441 \end{wrapfigure} | |
442 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | |
443 %\vspace{.4cm} | |
382 The scratches module places line-like white patches on the image. The | 444 The scratches module places line-like white patches on the image. The |
383 lines are heavily transformed images of the digit ``1'' (one), chosen | 445 lines are heavily transformed images of the digit ``1'' (one), chosen |
384 at random among 500 such 1 images, | 446 at random among 500 such 1 images, |
385 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | 447 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
386 complexity)^2$ (in degrees), using bi-cubic interpolation. | 448 complexity)^2$ (in degrees), using bi-cubic interpolation. |
388 are applied, reducing the width of the line | 450 are applied, reducing the width of the line |
389 by an amount controlled by $complexity$. | 451 by an amount controlled by $complexity$. |
390 This filter is skipped with probability 85\%. The probabilities | 452 This filter is skipped with probability 85\%. The probabilities |
391 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). | 453 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). |
392 \end{minipage} | 454 \end{minipage} |
393 \vspace{-.7cm} | 455 |
394 | 456 \vspace*{2mm} |
395 \begin{minipage}[b]{0.14\linewidth} | 457 |
458 \begin{minipage}[t]{0.20\linewidth} | |
396 \centering | 459 \centering |
397 \includegraphics[scale=.45]{images/Contrast_only.PNG} | 460 \hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.PNG}\\ |
398 \label{fig:Original} | 461 {\bf Grey \& Contrast} |
399 \end{minipage}% | 462 \end{minipage}% |
400 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 463 \hspace{-4mm}\begin{minipage}[t]{0.82\linewidth} |
401 {\bf Grey Level and Contrast Changes.} | 464 \vspace*{-18mm} |
402 This filter changes the contrast and may invert the image polarity (white | 465 This filter changes the contrast by changing grey levels, and may invert the image polarity (white |
403 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ | 466 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ |
404 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 467 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
405 polarity is inverted with probability 50\%. | 468 polarity is inverted with probability 50\%. |
406 \vspace{.7cm} | 469 %\vspace{.7cm} |
407 \end{minipage} | 470 \end{minipage} |
408 \vspace{-.7cm} | 471 \vspace{2mm} |
409 | 472 |
410 | 473 |
411 \iffalse | 474 \iffalse |
412 \begin{figure}[ht] | 475 \begin{figure}[ht] |
413 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ | 476 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ |