comparison writeup/nips2010_submission.tex @ 551:8f365abf171d

separete the transmo image
author Frederic Bastien <nouiz@nouiz.org>
date Wed, 02 Jun 2010 17:00:11 -0400
parents 662299f265ab
children 35c611363291
comparison
equal deleted inserted replaced
550:662299f265ab 551:8f365abf171d
131 the self-taught learning framework. 131 the self-taught learning framework.
132 132
133 \vspace*{-1mm} 133 \vspace*{-1mm}
134 \section{Perturbation and Transformation of Character Images} 134 \section{Perturbation and Transformation of Character Images}
135 \label{s:perturbations} 135 \label{s:perturbations}
136 \vspace*{-1mm} 136 {\large\bf Transformations}
137 137
138 \vspace*{-1mm}
139
140 \begin{minipage}[b]{0.14\linewidth}
141 \centering
142 \includegraphics[scale=.45]{images/Original.PNG}
143 \label{fig:Original}
144 \vspace{1.2cm}
145 \end{minipage}%
146 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
147 {\bf Original:}
138 This section describes the different transformations we used to stochastically 148 This section describes the different transformations we used to stochastically
139 transform source images in order to obtain data from a larger distribution which 149 transform source images in order to obtain data from a larger distribution which
140 covers a domain substantially larger than the clean characters distribution from 150 covers a domain substantially larger than the clean characters distribution from
141 which we start. Although character transformations have been used before to 151 which we start. Although character transformations have been used before to
142 improve character recognizers, this effort is on a large scale both 152 improve character recognizers, this effort is on a large scale both
150 amount of deformation or noise introduced. 160 amount of deformation or noise introduced.
151 161
152 There are two main parts in the pipeline. The first one, 162 There are two main parts in the pipeline. The first one,
153 from slant to pinch below, performs transformations. The second 163 from slant to pinch below, performs transformations. The second
154 part, from blur to contrast, adds different kinds of noise. 164 part, from blur to contrast, adds different kinds of noise.
155 165 \end{minipage}
156 \begin{figure}[ht] 166
157 \vspace*{-2mm} 167
158 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}} 168 \begin{minipage}[b]{0.14\linewidth}
159 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE 169 \centering
160 \caption{Illustration of each transformation applied alone to the same image 170 \includegraphics[scale=.45]{images/Slant_only.PNG}
161 of an upper-case h (top left). First row (from left to right) : original image, slant, 171 \label{fig:Slant}
162 thickness, affine transformation (translation, rotation, shear), 172 \end{minipage}%
163 local elastic deformation; second row (from left to right) : 173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
164 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : 174 %\centering
165 background image, salt and pepper noise, spatially Gaussian noise, scratches, 175 {\bf Slant:}
166 grey level and contrast changes.}
167 \label{fig:transfo}
168 \vspace*{-2mm}
169 \end{figure}
170
171 {\large\bf Transformations}
172
173 \vspace*{0.5mm}
174
175 {\bf Slant.}
176 Each row of the image is shifted 176 Each row of the image is shifted
177 proportionally to its height: $shift = round(slant \times height)$. 177 proportionally to its height: $shift = round(slant \times height)$.
178 $slant \sim U[-complexity,complexity]$. 178 $slant \sim U[-complexity,complexity]$.
179 \vspace*{-1mm} 179 \vspace{1.2cm}
180 180 \end{minipage}
181 {\bf Thickness.} 181
182
183 \begin{minipage}[b]{0.14\linewidth}
184 \centering
185 \includegraphics[scale=.45]{images/Thick_only.PNG}
186 \label{fig:Think}
187 \vspace{.9cm}
188 \end{minipage}%
189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
190 {\bf Thinkness:}
182 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} 191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
183 are applied. The neighborhood of each pixel is multiplied 192 are applied. The neighborhood of each pixel is multiplied
184 element-wise with a {\em structuring element} matrix. 193 element-wise with a {\em structuring element} matrix.
185 The pixel value is replaced by the maximum or the minimum of the resulting 194 The pixel value is replaced by the maximum or the minimum of the resulting
186 matrix, respectively for dilation or erosion. Ten different structural elements with 195 matrix, respectively for dilation or erosion. Ten different structural elements with
188 randomly sample the operator type (dilation or erosion) with equal probability and one structural 197 randomly sample the operator type (dilation or erosion) with equal probability and one structural
189 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements 198 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
190 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). 199 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
191 A neutral element (no transformation) 200 A neutral element (no transformation)
192 is always present in the set. is applied. 201 is always present in the set. is applied.
193 \vspace*{-1mm} 202 \vspace{.4cm}
194 203 \end{minipage}
195 {\bf Affine Transformations.} 204 \vspace{-.7cm}
205
206
207 \begin{minipage}[b]{0.14\linewidth}
208 \centering
209 \includegraphics[scale=.45]{images/Affine_only.png}
210 \label{fig:Affine}
211 \end{minipage}%
212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
213 {\bf Affine Transformations:}
196 A $2 \times 3$ affine transform matrix (with 214 A $2 \times 3$ affine transform matrix (with
197 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. 215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
198 Output pixel $(x,y)$ takes the value of input pixel 216 Output pixel $(x,y)$ takes the value of input pixel
199 nearest to $(ax+by+c,dx+ey+f)$, 217 nearest to $(ax+by+c,dx+ey+f)$,
200 producing scaling, translation, rotation and shearing. 218 producing scaling, translation, rotation and shearing.
202 forbid large rotations (not to confuse classes) but to give good 220 forbid large rotations (not to confuse classes) but to give good
203 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times 221 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
204 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 222 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
205 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times 223 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
206 complexity]$. 224 complexity]$.
207 \vspace*{-1mm} 225 \end{minipage}
208 226
209 {\bf Local Elastic Deformations.} 227 \begin{minipage}[b]{0.14\linewidth}
228 \centering
229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG}
230 \label{fig:Elastic}
231 \end{minipage}%
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
233 {\bf Local Elastic Deformations:}
210 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, 234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
211 which provides more details. 235 which provides more details.
212 The intensity of the displacement fields is given by 236 The intensity of the displacement fields is given by
213 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are 237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are
214 convolved with a Gaussian 2D kernel (resulting in a blur) of 238 convolved with a Gaussian 2D kernel (resulting in a blur) of
215 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. 239 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
216 \vspace*{-1mm} 240 \vspace{.4cm}
217 241 \end{minipage}
218 {\bf Pinch.} 242 \vspace{-.7cm}
243
244 \begin{minipage}[b]{0.14\linewidth}
245 \centering
246 \includegraphics[scale=.45]{images/Pinch_only.PNG}
247 \label{fig:Pinch}
248 \vspace{.6cm}
249 \end{minipage}%
250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
251 {\bf Pinch:}
219 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. 252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0.
220 A pinch is ``similar to projecting the image onto an elastic 253 A pinch is ``similar to projecting the image onto an elastic
221 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). 254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
222 For a square input image, this is akin to drawing a circle of 255 For a square input image, this is akin to drawing a circle of
223 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
228 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times 261 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
229 d_1$, where $pinch$ is a parameter to the filter. 262 d_1$, where $pinch$ is a parameter to the filter.
230 The actual value is given by bilinear interpolation considering the pixels 263 The actual value is given by bilinear interpolation considering the pixels
231 around the (non-integer) source position thus found. 264 around the (non-integer) source position thus found.
232 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. 265 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
233 266 %\vspace{1.5cm}
234 \vspace*{0.5mm} 267 \end{minipage}
268
269 \vspace{.1cm}
235 270
236 {\large\bf Injecting Noise} 271 {\large\bf Injecting Noise}
237 272
238 \vspace*{0.5mm} 273 \vspace*{-.2cm}
239 274 \begin{minipage}[b]{0.14\linewidth}
240 {\bf Motion Blur.} 275 \centering
276 \includegraphics[scale=.45]{images/Original.PNG}
277 \label{fig:Original}
278 \end{minipage}%
279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
280 {\bf Motion Blur:}
241 This is GIMP's ``linear motion blur'' 281 This is GIMP's ``linear motion blur''
242 with parameters $length$ and $angle$. The value of 282 with parameters $length$ and $angle$. The value of
243 a pixel in the final image is approximately the mean value of the first $length$ pixels 283 a pixel in the final image is approximately the mean value of the first $length$ pixels
244 found by moving in the $angle$ direction. 284 found by moving in the $angle$ direction.
245 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. 285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
246 \vspace*{-1mm} 286 \vspace{.7cm}
247 287 \end{minipage}
248 {\bf Occlusion.} 288
289 \vspace*{-5mm}
290
291 \begin{minipage}[b]{0.14\linewidth}
292 \centering
293 \includegraphics[scale=.45]{images/Original.PNG}
294 \label{fig:Original}
295 \end{minipage}%
296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
297 {\bf Occlusion:}
249 Selects a random rectangle from an {\em occluder} character 298 Selects a random rectangle from an {\em occluder} character
250 image and places it over the original {\em occluded} 299 image and places it over the original {\em occluded}
251 image. Pixels are combined by taking the max(occluder,occluded), 300 image. Pixels are combined by taking the max(occluder,occluded),
252 closer to black. The rectangle corners 301 closer to black. The rectangle corners
253 are sampled so that larger complexity gives larger rectangles. 302 are sampled so that larger complexity gives larger rectangles.
254 The destination position in the occluded image are also sampled 303 The destination position in the occluded image are also sampled
255 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). 304 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
256 This filter is skipped with probability 60\%. 305 This filter is skipped with probability 60\%.
257 \vspace*{-1mm} 306 \vspace{.4cm}
258 307 \end{minipage}
259 {\bf Pixel Permutation.} 308
260 This filter permutes neighbouring pixels. It first selects 309 \vspace*{-5mm}
310 \begin{minipage}[b]{0.14\linewidth}
311 \centering
312 \includegraphics[scale=.45]{images/Original.PNG}
313 \label{fig:Original}
314 \end{minipage}%
315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
316 {\bf Pixel Permutation:}
317 This filter permutes neighbouring pixels. It first selects
261 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then 318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
262 sequentially exchanged with one other in as $V4$ neighbourhood. 319 sequentially exchanged with one other in as $V4$ neighbourhood.
263 This filter is skipped with probability 80\%. 320 This filter is skipped with probability 80\%.
264 \vspace*{-1mm} 321 \vspace{.8cm}
265 322 \end{minipage}
266 {\bf Gaussian Noise.} 323
324
325 \begin{minipage}[b]{0.14\linewidth}
326 \centering
327 \includegraphics[scale=.45]{images/Original.PNG}
328 \label{fig:Original}
329 \end{minipage}%
330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
331 {\bf Gaussian Noise:}
267 This filter simply adds, to each pixel of the image independently, a 332 This filter simply adds, to each pixel of the image independently, a
268 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. 333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
269 This filter is skipped with probability 70\%. 334 This filter is skipped with probability 70\%.
270 \vspace*{-1mm} 335 \vspace{1.1cm}
271 336 \end{minipage}
272 {\bf Background Images.} 337 \vspace{-.7cm}
338
339 \begin{minipage}[b]{0.14\linewidth}
340 \centering
341 \includegraphics[scale=.45]{images/Original.PNG}
342 \label{fig:Original}
343 \end{minipage}%
344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
345 {\bf Background Images:}
273 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random 346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
274 background behind the letter, from a randomly chosen natural image, 347 background behind the letter, from a randomly chosen natural image,
275 with contrast adjustments depending on $complexity$, to preserve 348 with contrast adjustments depending on $complexity$, to preserve
276 more or less of the original character image. 349 more or less of the original character image.
277 \vspace*{-1mm} 350 \vspace{.8cm}
278 351 \end{minipage}
279 {\bf Salt and Pepper Noise.} 352 \vspace{-.7cm}
353
354 \begin{minipage}[b]{0.14\linewidth}
355 \centering
356 \includegraphics[scale=.45]{images/Original.PNG}
357 \label{fig:Original}
358 \end{minipage}%
359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
360 {\bf Salt and Pepper Noise:}
280 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. 361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
281 The number of selected pixels is $0.2 \times complexity$. 362 The number of selected pixels is $0.2 \times complexity$.
282 This filter is skipped with probability 75\%. 363 This filter is skipped with probability 75\%.
283 \vspace*{-1mm} 364 \vspace{.9cm}
284 365 \end{minipage}
285 {\bf Spatially Gaussian Noise.} 366 \vspace{-.7cm}
367
368 \begin{minipage}[b]{0.14\linewidth}
369 \centering
370 \includegraphics[scale=.45]{images/Original.PNG}
371 \label{fig:Original}
372 \vspace{.5cm}
373 \end{minipage}%
374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
375 {\bf Spatially Gaussian Noise:}
286 Different regions of the image are spatially smoothed by convolving 376 Different regions of the image are spatially smoothed by convolving
287 the image is convolved with a symmetric Gaussian kernel of 377 the image is convolved with a symmetric Gaussian kernel of
288 size and variance chosen uniformly in the ranges $[12,12 + 20 \times 378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
289 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized 379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
290 between $0$ and $1$. We also create a symmetric averaging window, of the 380 between $0$ and $1$. We also create a symmetric averaging window, of the
294 initialize to zero a mask matrix of the image size. For each selected pixel 384 initialize to zero a mask matrix of the image size. For each selected pixel
295 we add to the mask the averaging window centered to it. The final image is 385 we add to the mask the averaging window centered to it. The final image is
296 computed from the following element-wise operation: $\frac{image + filtered 386 computed from the following element-wise operation: $\frac{image + filtered
297 image \times mask}{mask+1}$. 387 image \times mask}{mask+1}$.
298 This filter is skipped with probability 75\%. 388 This filter is skipped with probability 75\%.
299 \vspace*{-1mm} 389 \end{minipage}
300 390 \vspace{-.7cm}
301 {\bf Scratches.} 391
392 \begin{minipage}[b]{0.14\linewidth}
393 \centering
394 \includegraphics[scale=.45]{images/Original.PNG}
395 \label{fig:Original}
396 \end{minipage}%
397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
398 \vspace{.4cm}
399 {\bf Scratches:}
302 The scratches module places line-like white patches on the image. The 400 The scratches module places line-like white patches on the image. The
303 lines are heavily transformed images of the digit ``1'' (one), chosen 401 lines are heavily transformed images of the digit ``1'' (one), chosen
304 at random among 500 such 1 images, 402 at random among 500 such 1 images,
305 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times 403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
306 complexity)^2$, using bi-cubic interpolation. 404 complexity)^2$, using bi-cubic interpolation.
307 Two passes of a grey-scale morphological erosion filter 405 Two passes of a grey-scale morphological erosion filter
308 are applied, reducing the width of the line 406 are applied, reducing the width of the line
309 by an amount controlled by $complexity$. 407 by an amount controlled by $complexity$.
310 This filter is skipped with probability 85\%. The probabilities 408 This filter is skipped with probability 85\%. The probabilities
311 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). 409 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
312 \vspace*{-1mm} 410 \end{minipage}
313 411 \vspace{-.7cm}
314 {\bf Grey Level and Contrast Changes.} 412
413 \begin{minipage}[b]{0.14\linewidth}
414 \centering
415 \includegraphics[scale=.45]{images/Original.PNG}
416 \label{fig:Original}
417 \end{minipage}%
418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
419 {\bf Grey Level and Contrast Changes:}
315 This filter changes the contrast and may invert the image polarity (white 420 This filter changes the contrast and may invert the image polarity (white
316 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ 421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
317 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
318 polarity is inverted with probability 50\%. 423 polarity is inverted with probability 50\%.
424 \vspace{.7cm}
425 \end{minipage}
426 \vspace{-.7cm}
427
319 428
320 \iffalse 429 \iffalse
321 \begin{figure}[ht] 430 \begin{figure}[ht]
322 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ 431 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\
323 \caption{Illustration of the pipeline of stochastic 432 \caption{Illustration of the pipeline of stochastic