comparison writeup/techreport.tex @ 423:e4eb3ee7a0cf

merge
author Arnaud Bergeron <abergeron@gmail.com>
date Fri, 30 Apr 2010 16:25:37 -0400
parents e7790db265b1 a3a4a9c6476d
children 6e5f0f50ddab
comparison
equal deleted inserted replaced
422:e7790db265b1 423:e4eb3ee7a0cf
69 It is also only recently that 69 It is also only recently that
70 successful algorithms were proposed to overcome some of these 70 successful algorithms were proposed to overcome some of these
71 difficulties. 71 difficulties.
72 72
73 \section{Perturbation and Transformation of Character Images} 73 \section{Perturbation and Transformation of Character Images}
74 This section describes the different transformations we used to generate data, in their order.
75 We can differentiate two important parts in the pipeline. The first one, from slant to pinch, perform transformations
76 of the character. The second part, from blur to contrast, add noise to the image.
74 77
75 \subsection{Adding Slant} 78 \subsection{Adding Slant}
76 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height. 79 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$.
77 The coefficient is randomly sampled according to the complexity level and can be negatif or positif with equal probability. 80 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time
81 and also because latter transformations have similar effects.
82
83 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level.
84 In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest
85 or highest pixel line is of $round(complexity \times 32)$.
86
78 87
79 \subsection{Changing Thickness} 88 \subsection{Changing Thickness}
80 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}. 89 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i
90
81 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. 91 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element.
82 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. 92 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion.
83 This will dilate or erode objects in the image, the strength of the transform only depends on the structuring element. 93 This will dilate or erode objects in the image and strength of the transform only depends on the structuring element.
84 We used ten different structural elements with various shapes (the biggest is $5\times5$). 94
85 for each image, we radomly sample the operator type (dilation or erosion) and one structural element 95 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$).
86 from a subset depending of the complexity (the higher the complexity, the biggest the structural element can be). 96 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element
87 Erosion allows only the five smallest structural elements because when the character is too thin it may erase it completly. 97 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion.
98 A neutral element is always present in the set, if it is chosen the transformation is not applied.
99 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly.
88 100
89 \subsection{Affine Transformations} 101 \subsection{Affine Transformations}
90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. 102 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image.
91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied 103 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied
92 to the image is low enough not to confuse classes. 104 to the image is low enough not to confuse classes.
93 105
94 \subsection{Local Elastic Deformations} 106 \subsection{Local Elastic Deformations}
95
96 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. 107 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}.
97 108
98 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. 109 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image.
99 110
100 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead. 111 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead.
126 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. 137 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found.
127 138
128 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. 139 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
129 140
130 141
142 \subsection{Distorsion gauss}
143 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$.
144
145 It has has a probability of not being applied, at all, of 70\%.
146
147
131 \subsection{Occlusion} 148 \subsection{Occlusion}
132 149
133 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. 150 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
134 151
135 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. 152 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
156 173
157 \subsection{Salt and Pepper Noise} 174 \subsection{Salt and Pepper Noise}
158 175
159 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. 176 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed.
160 177
161 This filter also has a probability of not being applied, at all, of 25\%. 178 This filter also has a probability of not being applied, at all, of 75\%.
162 179
163 \subsection{Spatially Gaussian Noise} 180 \subsection{Spatially Gaussian Noise}
181 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time
182 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges:
183 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$.
184 We also create a symmetric averaging window, of the kernel size, with maximum value at the center.
185 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers
186 between the original image and the filtered one.
187 We initialize to zero a mask matrix of the image size. For each selected pixel we add to the mask the averaging window centered to it.
188 The final image is computed from the following element-wise operation: $\frac{image + filtered_image \times mask}{mask+1}$.
189
190 This filter has a probability of not being applied, at all, of 75\%.
191
192
164 \subsection{Color and Contrast Changes} 193 \subsection{Color and Contrast Changes}
194 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference
195 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-\0.85 \times complexity}$
196 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity
197 is inverted with $0.5$ probability.
198
165 199
166 \begin{figure}[h] 200 \begin{figure}[h]
167 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ 201 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\
168 \caption{Illustration of the pipeline of stochastic 202 \caption{Illustration of the pipeline of stochastic
169 transformations applied to the image of a lower-case t 203 transformations applied to the image of a lower-case t
204 238
205 \subsubsection{Data Sets} 239 \subsubsection{Data Sets}
206 \begin{itemize} 240 \begin{itemize}
207 \item {\bf NIST} 241 \item {\bf NIST}
208 \item {\bf P07} 242 \item {\bf P07}
243 The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$.
244 For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts,
245 $0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order
246 and for each of them we sample uniformly a complexity in the range $[0,0.7]$.
209 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} 247 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...}
248 NISTP is equivalent to P07 except that we only apply transformations from slant to pinch. Therefore, the character is transformed
249 but no additionnal noise is added to the image, this gives images closer to the NIST dataset.
210 \end{itemize} 250 \end{itemize}
211 251
212 \subsection{Models and their Hyperparameters} 252 \subsection{Models and their Hyperparameters}
213 253
214 \subsubsection{Multi-Layer Perceptrons (MLP)} 254 \subsubsection{Multi-Layer Perceptrons (MLP)}