Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 551:8f365abf171d
separete the transmo image
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Wed, 02 Jun 2010 17:00:11 -0400 |
parents | 662299f265ab |
children | 35c611363291 |
comparison
equal
deleted
inserted
replaced
550:662299f265ab | 551:8f365abf171d |
---|---|
131 the self-taught learning framework. | 131 the self-taught learning framework. |
132 | 132 |
133 \vspace*{-1mm} | 133 \vspace*{-1mm} |
134 \section{Perturbation and Transformation of Character Images} | 134 \section{Perturbation and Transformation of Character Images} |
135 \label{s:perturbations} | 135 \label{s:perturbations} |
136 \vspace*{-1mm} | 136 {\large\bf Transformations} |
137 | 137 |
138 \vspace*{-1mm} | |
139 | |
140 \begin{minipage}[b]{0.14\linewidth} | |
141 \centering | |
142 \includegraphics[scale=.45]{images/Original.PNG} | |
143 \label{fig:Original} | |
144 \vspace{1.2cm} | |
145 \end{minipage}% | |
146 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
147 {\bf Original:} | |
138 This section describes the different transformations we used to stochastically | 148 This section describes the different transformations we used to stochastically |
139 transform source images in order to obtain data from a larger distribution which | 149 transform source images in order to obtain data from a larger distribution which |
140 covers a domain substantially larger than the clean characters distribution from | 150 covers a domain substantially larger than the clean characters distribution from |
141 which we start. Although character transformations have been used before to | 151 which we start. Although character transformations have been used before to |
142 improve character recognizers, this effort is on a large scale both | 152 improve character recognizers, this effort is on a large scale both |
150 amount of deformation or noise introduced. | 160 amount of deformation or noise introduced. |
151 | 161 |
152 There are two main parts in the pipeline. The first one, | 162 There are two main parts in the pipeline. The first one, |
153 from slant to pinch below, performs transformations. The second | 163 from slant to pinch below, performs transformations. The second |
154 part, from blur to contrast, adds different kinds of noise. | 164 part, from blur to contrast, adds different kinds of noise. |
155 | 165 \end{minipage} |
156 \begin{figure}[ht] | 166 |
157 \vspace*{-2mm} | 167 |
158 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}} | 168 \begin{minipage}[b]{0.14\linewidth} |
159 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE | 169 \centering |
160 \caption{Illustration of each transformation applied alone to the same image | 170 \includegraphics[scale=.45]{images/Slant_only.PNG} |
161 of an upper-case h (top left). First row (from left to right) : original image, slant, | 171 \label{fig:Slant} |
162 thickness, affine transformation (translation, rotation, shear), | 172 \end{minipage}% |
163 local elastic deformation; second row (from left to right) : | 173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
164 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : | 174 %\centering |
165 background image, salt and pepper noise, spatially Gaussian noise, scratches, | 175 {\bf Slant:} |
166 grey level and contrast changes.} | |
167 \label{fig:transfo} | |
168 \vspace*{-2mm} | |
169 \end{figure} | |
170 | |
171 {\large\bf Transformations} | |
172 | |
173 \vspace*{0.5mm} | |
174 | |
175 {\bf Slant.} | |
176 Each row of the image is shifted | 176 Each row of the image is shifted |
177 proportionally to its height: $shift = round(slant \times height)$. | 177 proportionally to its height: $shift = round(slant \times height)$. |
178 $slant \sim U[-complexity,complexity]$. | 178 $slant \sim U[-complexity,complexity]$. |
179 \vspace*{-1mm} | 179 \vspace{1.2cm} |
180 | 180 \end{minipage} |
181 {\bf Thickness.} | 181 |
182 | |
183 \begin{minipage}[b]{0.14\linewidth} | |
184 \centering | |
185 \includegraphics[scale=.45]{images/Thick_only.PNG} | |
186 \label{fig:Think} | |
187 \vspace{.9cm} | |
188 \end{minipage}% | |
189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
190 {\bf Thinkness:} | |
182 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} | 191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
183 are applied. The neighborhood of each pixel is multiplied | 192 are applied. The neighborhood of each pixel is multiplied |
184 element-wise with a {\em structuring element} matrix. | 193 element-wise with a {\em structuring element} matrix. |
185 The pixel value is replaced by the maximum or the minimum of the resulting | 194 The pixel value is replaced by the maximum or the minimum of the resulting |
186 matrix, respectively for dilation or erosion. Ten different structural elements with | 195 matrix, respectively for dilation or erosion. Ten different structural elements with |
188 randomly sample the operator type (dilation or erosion) with equal probability and one structural | 197 randomly sample the operator type (dilation or erosion) with equal probability and one structural |
189 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements | 198 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements |
190 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). | 199 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). |
191 A neutral element (no transformation) | 200 A neutral element (no transformation) |
192 is always present in the set. is applied. | 201 is always present in the set. is applied. |
193 \vspace*{-1mm} | 202 \vspace{.4cm} |
194 | 203 \end{minipage} |
195 {\bf Affine Transformations.} | 204 \vspace{-.7cm} |
205 | |
206 | |
207 \begin{minipage}[b]{0.14\linewidth} | |
208 \centering | |
209 \includegraphics[scale=.45]{images/Affine_only.png} | |
210 \label{fig:Affine} | |
211 \end{minipage}% | |
212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
213 {\bf Affine Transformations:} | |
196 A $2 \times 3$ affine transform matrix (with | 214 A $2 \times 3$ affine transform matrix (with |
197 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | 215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. |
198 Output pixel $(x,y)$ takes the value of input pixel | 216 Output pixel $(x,y)$ takes the value of input pixel |
199 nearest to $(ax+by+c,dx+ey+f)$, | 217 nearest to $(ax+by+c,dx+ey+f)$, |
200 producing scaling, translation, rotation and shearing. | 218 producing scaling, translation, rotation and shearing. |
202 forbid large rotations (not to confuse classes) but to give good | 220 forbid large rotations (not to confuse classes) but to give good |
203 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times | 221 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times |
204 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 | 222 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 |
205 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times | 223 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times |
206 complexity]$. | 224 complexity]$. |
207 \vspace*{-1mm} | 225 \end{minipage} |
208 | 226 |
209 {\bf Local Elastic Deformations.} | 227 \begin{minipage}[b]{0.14\linewidth} |
228 \centering | |
229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} | |
230 \label{fig:Elastic} | |
231 \end{minipage}% | |
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
233 {\bf Local Elastic Deformations:} | |
210 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | 234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, |
211 which provides more details. | 235 which provides more details. |
212 The intensity of the displacement fields is given by | 236 The intensity of the displacement fields is given by |
213 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are | 237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are |
214 convolved with a Gaussian 2D kernel (resulting in a blur) of | 238 convolved with a Gaussian 2D kernel (resulting in a blur) of |
215 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. | 239 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. |
216 \vspace*{-1mm} | 240 \vspace{.4cm} |
217 | 241 \end{minipage} |
218 {\bf Pinch.} | 242 \vspace{-.7cm} |
243 | |
244 \begin{minipage}[b]{0.14\linewidth} | |
245 \centering | |
246 \includegraphics[scale=.45]{images/Pinch_only.PNG} | |
247 \label{fig:Pinch} | |
248 \vspace{.6cm} | |
249 \end{minipage}% | |
250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
251 {\bf Pinch:} | |
219 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. | 252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. |
220 A pinch is ``similar to projecting the image onto an elastic | 253 A pinch is ``similar to projecting the image onto an elastic |
221 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
222 For a square input image, this is akin to drawing a circle of | 255 For a square input image, this is akin to drawing a circle of |
223 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
228 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | 261 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times |
229 d_1$, where $pinch$ is a parameter to the filter. | 262 d_1$, where $pinch$ is a parameter to the filter. |
230 The actual value is given by bilinear interpolation considering the pixels | 263 The actual value is given by bilinear interpolation considering the pixels |
231 around the (non-integer) source position thus found. | 264 around the (non-integer) source position thus found. |
232 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | 265 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. |
233 | 266 %\vspace{1.5cm} |
234 \vspace*{0.5mm} | 267 \end{minipage} |
268 | |
269 \vspace{.1cm} | |
235 | 270 |
236 {\large\bf Injecting Noise} | 271 {\large\bf Injecting Noise} |
237 | 272 |
238 \vspace*{0.5mm} | 273 \vspace*{-.2cm} |
239 | 274 \begin{minipage}[b]{0.14\linewidth} |
240 {\bf Motion Blur.} | 275 \centering |
276 \includegraphics[scale=.45]{images/Original.PNG} | |
277 \label{fig:Original} | |
278 \end{minipage}% | |
279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
280 {\bf Motion Blur:} | |
241 This is GIMP's ``linear motion blur'' | 281 This is GIMP's ``linear motion blur'' |
242 with parameters $length$ and $angle$. The value of | 282 with parameters $length$ and $angle$. The value of |
243 a pixel in the final image is approximately the mean value of the first $length$ pixels | 283 a pixel in the final image is approximately the mean value of the first $length$ pixels |
244 found by moving in the $angle$ direction. | 284 found by moving in the $angle$ direction. |
245 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | 285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. |
246 \vspace*{-1mm} | 286 \vspace{.7cm} |
247 | 287 \end{minipage} |
248 {\bf Occlusion.} | 288 |
289 \vspace*{-5mm} | |
290 | |
291 \begin{minipage}[b]{0.14\linewidth} | |
292 \centering | |
293 \includegraphics[scale=.45]{images/Original.PNG} | |
294 \label{fig:Original} | |
295 \end{minipage}% | |
296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
297 {\bf Occlusion:} | |
249 Selects a random rectangle from an {\em occluder} character | 298 Selects a random rectangle from an {\em occluder} character |
250 image and places it over the original {\em occluded} | 299 image and places it over the original {\em occluded} |
251 image. Pixels are combined by taking the max(occluder,occluded), | 300 image. Pixels are combined by taking the max(occluder,occluded), |
252 closer to black. The rectangle corners | 301 closer to black. The rectangle corners |
253 are sampled so that larger complexity gives larger rectangles. | 302 are sampled so that larger complexity gives larger rectangles. |
254 The destination position in the occluded image are also sampled | 303 The destination position in the occluded image are also sampled |
255 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). | 304 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). |
256 This filter is skipped with probability 60\%. | 305 This filter is skipped with probability 60\%. |
257 \vspace*{-1mm} | 306 \vspace{.4cm} |
258 | 307 \end{minipage} |
259 {\bf Pixel Permutation.} | 308 |
260 This filter permutes neighbouring pixels. It first selects | 309 \vspace*{-5mm} |
310 \begin{minipage}[b]{0.14\linewidth} | |
311 \centering | |
312 \includegraphics[scale=.45]{images/Original.PNG} | |
313 \label{fig:Original} | |
314 \end{minipage}% | |
315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
316 {\bf Pixel Permutation:} | |
317 This filter permutes neighbouring pixels. It first selects | |
261 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | 318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then |
262 sequentially exchanged with one other in as $V4$ neighbourhood. | 319 sequentially exchanged with one other in as $V4$ neighbourhood. |
263 This filter is skipped with probability 80\%. | 320 This filter is skipped with probability 80\%. |
264 \vspace*{-1mm} | 321 \vspace{.8cm} |
265 | 322 \end{minipage} |
266 {\bf Gaussian Noise.} | 323 |
324 | |
325 \begin{minipage}[b]{0.14\linewidth} | |
326 \centering | |
327 \includegraphics[scale=.45]{images/Original.PNG} | |
328 \label{fig:Original} | |
329 \end{minipage}% | |
330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
331 {\bf Gaussian Noise:} | |
267 This filter simply adds, to each pixel of the image independently, a | 332 This filter simply adds, to each pixel of the image independently, a |
268 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. | 333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. |
269 This filter is skipped with probability 70\%. | 334 This filter is skipped with probability 70\%. |
270 \vspace*{-1mm} | 335 \vspace{1.1cm} |
271 | 336 \end{minipage} |
272 {\bf Background Images.} | 337 \vspace{-.7cm} |
338 | |
339 \begin{minipage}[b]{0.14\linewidth} | |
340 \centering | |
341 \includegraphics[scale=.45]{images/Original.PNG} | |
342 \label{fig:Original} | |
343 \end{minipage}% | |
344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
345 {\bf Background Images:} | |
273 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | 346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
274 background behind the letter, from a randomly chosen natural image, | 347 background behind the letter, from a randomly chosen natural image, |
275 with contrast adjustments depending on $complexity$, to preserve | 348 with contrast adjustments depending on $complexity$, to preserve |
276 more or less of the original character image. | 349 more or less of the original character image. |
277 \vspace*{-1mm} | 350 \vspace{.8cm} |
278 | 351 \end{minipage} |
279 {\bf Salt and Pepper Noise.} | 352 \vspace{-.7cm} |
353 | |
354 \begin{minipage}[b]{0.14\linewidth} | |
355 \centering | |
356 \includegraphics[scale=.45]{images/Original.PNG} | |
357 \label{fig:Original} | |
358 \end{minipage}% | |
359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
360 {\bf Salt and Pepper Noise:} | |
280 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | 361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
281 The number of selected pixels is $0.2 \times complexity$. | 362 The number of selected pixels is $0.2 \times complexity$. |
282 This filter is skipped with probability 75\%. | 363 This filter is skipped with probability 75\%. |
283 \vspace*{-1mm} | 364 \vspace{.9cm} |
284 | 365 \end{minipage} |
285 {\bf Spatially Gaussian Noise.} | 366 \vspace{-.7cm} |
367 | |
368 \begin{minipage}[b]{0.14\linewidth} | |
369 \centering | |
370 \includegraphics[scale=.45]{images/Original.PNG} | |
371 \label{fig:Original} | |
372 \vspace{.5cm} | |
373 \end{minipage}% | |
374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
375 {\bf Spatially Gaussian Noise:} | |
286 Different regions of the image are spatially smoothed by convolving | 376 Different regions of the image are spatially smoothed by convolving |
287 the image is convolved with a symmetric Gaussian kernel of | 377 the image is convolved with a symmetric Gaussian kernel of |
288 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
289 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
290 between $0$ and $1$. We also create a symmetric averaging window, of the | 380 between $0$ and $1$. We also create a symmetric averaging window, of the |
294 initialize to zero a mask matrix of the image size. For each selected pixel | 384 initialize to zero a mask matrix of the image size. For each selected pixel |
295 we add to the mask the averaging window centered to it. The final image is | 385 we add to the mask the averaging window centered to it. The final image is |
296 computed from the following element-wise operation: $\frac{image + filtered | 386 computed from the following element-wise operation: $\frac{image + filtered |
297 image \times mask}{mask+1}$. | 387 image \times mask}{mask+1}$. |
298 This filter is skipped with probability 75\%. | 388 This filter is skipped with probability 75\%. |
299 \vspace*{-1mm} | 389 \end{minipage} |
300 | 390 \vspace{-.7cm} |
301 {\bf Scratches.} | 391 |
392 \begin{minipage}[b]{0.14\linewidth} | |
393 \centering | |
394 \includegraphics[scale=.45]{images/Original.PNG} | |
395 \label{fig:Original} | |
396 \end{minipage}% | |
397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
398 \vspace{.4cm} | |
399 {\bf Scratches:} | |
302 The scratches module places line-like white patches on the image. The | 400 The scratches module places line-like white patches on the image. The |
303 lines are heavily transformed images of the digit ``1'' (one), chosen | 401 lines are heavily transformed images of the digit ``1'' (one), chosen |
304 at random among 500 such 1 images, | 402 at random among 500 such 1 images, |
305 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | 403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
306 complexity)^2$, using bi-cubic interpolation. | 404 complexity)^2$, using bi-cubic interpolation. |
307 Two passes of a grey-scale morphological erosion filter | 405 Two passes of a grey-scale morphological erosion filter |
308 are applied, reducing the width of the line | 406 are applied, reducing the width of the line |
309 by an amount controlled by $complexity$. | 407 by an amount controlled by $complexity$. |
310 This filter is skipped with probability 85\%. The probabilities | 408 This filter is skipped with probability 85\%. The probabilities |
311 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). | 409 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). |
312 \vspace*{-1mm} | 410 \end{minipage} |
313 | 411 \vspace{-.7cm} |
314 {\bf Grey Level and Contrast Changes.} | 412 |
413 \begin{minipage}[b]{0.14\linewidth} | |
414 \centering | |
415 \includegraphics[scale=.45]{images/Original.PNG} | |
416 \label{fig:Original} | |
417 \end{minipage}% | |
418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | |
419 {\bf Grey Level and Contrast Changes:} | |
315 This filter changes the contrast and may invert the image polarity (white | 420 This filter changes the contrast and may invert the image polarity (white |
316 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ | 421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ |
317 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
318 polarity is inverted with probability 50\%. | 423 polarity is inverted with probability 50\%. |
424 \vspace{.7cm} | |
425 \end{minipage} | |
426 \vspace{-.7cm} | |
427 | |
319 | 428 |
320 \iffalse | 429 \iffalse |
321 \begin{figure}[ht] | 430 \begin{figure}[ht] |
322 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ | 431 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/example_t.png}}}\\ |
323 \caption{Illustration of the pipeline of stochastic | 432 \caption{Illustration of the pipeline of stochastic |