Mercurial > ift6266
comparison writeup/techreport.tex @ 420:a3a4a9c6476d
added transformations description and began dataset descriptions
author | Xavier Glorot <glorotxa@iro.umontreal.ca> |
---|---|
date | Fri, 30 Apr 2010 16:23:15 -0400 |
parents | 0282882aa91f |
children | e4eb3ee7a0cf c06a3d9b5664 |
comparison
equal
deleted
inserted
replaced
417:0282882aa91f | 420:a3a4a9c6476d |
---|---|
69 It is also only recently that | 69 It is also only recently that |
70 successful algorithms were proposed to overcome some of these | 70 successful algorithms were proposed to overcome some of these |
71 difficulties. | 71 difficulties. |
72 | 72 |
73 \section{Perturbation and Transformation of Character Images} | 73 \section{Perturbation and Transformation of Character Images} |
74 This section describes the different transformations we used to generate data, in their order. | |
75 We can differentiate two important parts in the pipeline. The first one, from slant to pinch, perform transformations | |
76 of the character. The second part, from blur to contrast, add noise to the image. | |
74 | 77 |
75 \subsection{Adding Slant} | 78 \subsection{Adding Slant} |
76 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height. | 79 In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$. |
77 The coefficient is randomly sampled according to the complexity level and can be negatif or positif with equal probability. | 80 We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time |
81 and also because latter transformations have similar effects. | |
82 | |
83 The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level. | |
84 In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest | |
85 or highest pixel line is of $round(complexity \times 32)$. | |
86 | |
78 | 87 |
79 \subsection{Changing Thickness} | 88 \subsection{Changing Thickness} |
80 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}. | 89 To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i |
90 | |
81 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. | 91 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. |
82 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. | 92 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. |
83 This will dilate or erode objects in the image, the strength of the transform only depends on the structuring element. | 93 This will dilate or erode objects in the image and strength of the transform only depends on the structuring element. |
84 We used ten different structural elements with various shapes (the biggest is $5\times5$). | 94 |
85 for each image, we radomly sample the operator type (dilation or erosion) and one structural element | 95 We used ten different structural elements with increasing dimensions (the biggest is $5\times5$). |
86 from a subset depending of the complexity (the higher the complexity, the biggest the structural element can be). | 96 for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element |
87 Erosion allows only the five smallest structural elements because when the character is too thin it may erase it completly. | 97 from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion. |
98 A neutral element is always present in the set, if it is chosen the transformation is not applied. | |
99 Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly. | |
88 | 100 |
89 \subsection{Affine Transformations} | 101 \subsection{Affine Transformations} |
90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. | 102 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. |
91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied | 103 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied |
92 to the image is low enough not to confuse classes. | 104 to the image is low enough not to confuse classes. |
93 | 105 |
94 \subsection{Local Elastic Deformations} | 106 \subsection{Local Elastic Deformations} |
95 | |
96 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. | 107 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. |
97 | 108 |
98 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. | 109 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. |
99 | 110 |
100 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead. | 111 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead. |
126 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. | 137 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. |
127 | 138 |
128 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. | 139 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. |
129 | 140 |
130 | 141 |
142 \subsection{Distorsion gauss} | |
143 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. | |
144 | |
145 It has has a probability of not being applied, at all, of 70\%. | |
146 | |
147 | |
131 \subsection{Occlusion} | 148 \subsection{Occlusion} |
132 | 149 |
133 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. | 150 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. |
134 | 151 |
135 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. | 152 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. |
156 | 173 |
157 \subsection{Salt and Pepper Noise} | 174 \subsection{Salt and Pepper Noise} |
158 | 175 |
159 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. | 176 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. |
160 | 177 |
161 This filter also has a probability of not being applied, at all, of 25\%. | 178 This filter also has a probability of not being applied, at all, of 75\%. |
162 | 179 |
163 \subsection{Spatially Gaussian Noise} | 180 \subsection{Spatially Gaussian Noise} |
181 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time | |
182 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges: | |
183 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. | |
184 We also create a symmetric averaging window, of the kernel size, with maximum value at the center. | |
185 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers | |
186 between the original image and the filtered one. | |
187 We initialize to zero a mask matrix of the image size. For each selected pixel we add to the mask the averaging window centered to it. | |
188 The final image is computed from the following element-wise operation: $\frac{image + filtered_image \times mask}{mask+1}$. | |
189 | |
190 This filter has a probability of not being applied, at all, of 75\%. | |
191 | |
192 | |
164 \subsection{Color and Contrast Changes} | 193 \subsection{Color and Contrast Changes} |
194 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference | |
195 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-\0.85 \times complexity}$ | |
196 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity | |
197 is inverted with $0.5$ probability. | |
198 | |
165 | 199 |
166 \begin{figure}[h] | 200 \begin{figure}[h] |
167 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ | 201 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ |
168 \caption{Illustration of the pipeline of stochastic | 202 \caption{Illustration of the pipeline of stochastic |
169 transformations applied to the image of a lower-case t | 203 transformations applied to the image of a lower-case t |
191 | 225 |
192 \subsubsection{Data Sets} | 226 \subsubsection{Data Sets} |
193 \begin{itemize} | 227 \begin{itemize} |
194 \item {\bf NIST} | 228 \item {\bf NIST} |
195 \item {\bf P07} | 229 \item {\bf P07} |
230 The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. | |
231 For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts, | |
232 $0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order | |
233 and for each of them we sample uniformly a complexity in the range $[0,0.7]$. | |
196 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} | 234 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} |
235 NISTP is equivalent to P07 except that we only apply transformations from slant to pinch. Therefore, the character is transformed | |
236 but no additionnal noise is added to the image, this gives images closer to the NIST dataset. | |
197 \end{itemize} | 237 \end{itemize} |
198 | 238 |
199 \subsection{Models and their Hyperparameters} | 239 \subsection{Models and their Hyperparameters} |
200 | 240 |
201 \subsubsection{Multi-Layer Perceptrons (MLP)} | 241 \subsubsection{Multi-Layer Perceptrons (MLP)} |