comparison writeup/techreport.tex @ 441:1272dc84a30c

merge
author Arnaud Bergeron <abergeron@gmail.com>
date Mon, 03 May 2010 13:55:03 -0400
parents 89258bb41e4c bfa349f567e8
children 89a49dae6cf3
comparison
equal deleted inserted replaced
440:89258bb41e4c 441:1272dc84a30c
82 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$. 82 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$.
83 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time 83 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time
84 and also because latter transformations have similar effects. 84 and also because latter transformations have similar effects.
85 85
86 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level. 86 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level.
87 In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest 87 In our case we take uniformly a number in the range $[0,complexity]$, so the maximum displacement for the lowest
88 or highest pixel line is of $round(complexity \times 32)$. 88 or highest pixel line is of $round(complexity \times 32)$.
89 89
90 90
91 \subsection{Changing Thickness} 91 \subsection{Changing Thickness}
92 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i 92 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.
93 93
94 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. 94 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element.
95 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. 95 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion.
96 This will dilate or erode objects in the image and strength of the transform only depends on the structuring element. 96 This will dilate or erode objects in the image and the strength of the transform only depends on the structuring element.
97 97
98 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$). 98 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$).
99 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element 99 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element
100 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion. 100 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion.
101 A neutral element is always present in the set, if it is chosen the transformation is not applied. 101 A neutral element is always present in the set, if it is chosen the transformation is not applied.
102 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly. 102 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly.
103 103
104 \subsection{Affine Transformations} 104 \subsection{Affine Transformations}
105 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. 105 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image.
106 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied 106 The matrix is of size $2 \times 3$, so we can represent it by six parameters $(a,b,c,d,e,f)$.
107 to the image is low enough not to confuse classes. 107 Formally, for each pixel $(x,y)$ of the output image,
108 we give the value of the pixel nearest to : $(ax+by+c,dx+ey+f)$, in the input image.
109 This allows to produce scaling, translation, rotation and shearing variances.
110
111 The sampling of the parameters $(a,b,c,d,e,f)$ have been tuned by hand to forbid important rotations (not to confuse classes) but to give good variability of the transformation. For each image we sample uniformly the parameters in the following ranges:
112 $a$ and $d$ in $[1-3 \times complexity,1+3 \times complexity]$, $b$ and $e$ in $[-3 \times complexity,3 \times complexity]$ and $c$ and $f$ in $[-4 \times complexity, 4 \times complexity]$.
113
108 114
109 \subsection{Local Elastic Deformations} 115 \subsection{Local Elastic Deformations}
110 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. 116 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}.
111 117
112 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. 118 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image.