comparison writeup/techreport.tex @ 431:bfa349f567e8

correction in the transformation descripition
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Mon, 03 May 2010 01:07:21 -0400
parents 9fcd0215b8d5
children 1272dc84a30c d5b2b6397a5a
comparison
equal deleted inserted replaced
430:5777b5041ac9 431:bfa349f567e8
79 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$. 79 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$.
80 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time 80 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time
81 and also because latter transformations have similar effects. 81 and also because latter transformations have similar effects.
82 82
83 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level. 83 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level.
84 In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest 84 In our case we take uniformly a number in the range $[0,complexity]$, so the maximum displacement for the lowest
85 or highest pixel line is of $round(complexity \times 32)$. 85 or highest pixel line is of $round(complexity \times 32)$.
86 86
87 87
88 \subsection{Changing Thickness} 88 \subsection{Changing Thickness}
89 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i 89 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.
90 90
91 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. 91 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element.
92 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. 92 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion.
93 This will dilate or erode objects in the image and strength of the transform only depends on the structuring element. 93 This will dilate or erode objects in the image and the strength of the transform only depends on the structuring element.
94 94
95 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$). 95 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$).
96 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element 96 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element
97 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion. 97 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion.
98 A neutral element is always present in the set, if it is chosen the transformation is not applied. 98 A neutral element is always present in the set, if it is chosen the transformation is not applied.
99 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly. 99 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly.
100 100
101 \subsection{Affine Transformations} 101 \subsection{Affine Transformations}
102 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. 102 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image.
103 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied 103 The matrix is of size $2 \times 3$, so we can represent it by six parameters $(a,b,c,d,e,f)$.
104 to the image is low enough not to confuse classes. 104 Formally, for each pixel $(x,y)$ of the output image,
105 we give the value of the pixel nearest to : $(ax+by+c,dx+ey+f)$, in the input image.
106 This allows to produce scaling, translation, rotation and shearing variances.
107
108 The sampling of the parameters $(a,b,c,d,e,f)$ have been tuned by hand to forbid important rotations (not to confuse classes) but to give good variability of the transformation. For each image we sample uniformly the parameters in the following ranges:
109 $a$ and $d$ in $[1-3 \times complexity,1+3 \times complexity]$, $b$ and $e$ in $[-3 \times complexity,3 \times complexity]$ and $c$ and $f$ in $[-4 \times complexity, 4 \times complexity]$.
110
105 111
106 \subsection{Local Elastic Deformations} 112 \subsection{Local Elastic Deformations}
107 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. 113 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}.
108 114
109 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. 115 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image.