Mercurial > ift6266
diff writeup/techreport.tex @ 420:a3a4a9c6476d
added transformations description and began dataset descriptions
author | Xavier Glorot <glorotxa@iro.umontreal.ca> |
---|---|
date | Fri, 30 Apr 2010 16:23:15 -0400 |
parents | 0282882aa91f |
children | e4eb3ee7a0cf c06a3d9b5664 |
line wrap: on
line diff
--- a/writeup/techreport.tex Fri Apr 30 09:25:20 2010 -0400 +++ b/writeup/techreport.tex Fri Apr 30 16:23:15 2010 -0400 @@ -71,20 +71,32 @@ difficulties. \section{Perturbation and Transformation of Character Images} +This section describes the different transformations we used to generate data, in their order. +We can differentiate two important parts in the pipeline. The first one, from slant to pinch, perform transformations +of the character. The second part, from blur to contrast, add noise to the image. \subsection{Adding Slant} -In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height. -The coefficient is randomly sampled according to the complexity level and can be negatif or positif with equal probability. +In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$. +We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time +and also because latter transformations have similar effects. + +The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level. +In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest +or highest pixel line is of $round(complexity \times 32)$. + \subsection{Changing Thickness} -To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}. +To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i + The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. -This will dilate or erode objects in the image, the strength of the transform only depends on the structuring element. -We used ten different structural elements with various shapes (the biggest is $5\times5$). -for each image, we radomly sample the operator type (dilation or erosion) and one structural element -from a subset depending of the complexity (the higher the complexity, the biggest the structural element can be). -Erosion allows only the five smallest structural elements because when the character is too thin it may erase it completly. +This will dilate or erode objects in the image and strength of the transform only depends on the structuring element. + +We used ten different structural elements with increasing dimensions (the biggest is $5\times5$). +for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element +from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion. +A neutral element is always present in the set, if it is chosen the transformation is not applied. +Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly. \subsection{Affine Transformations} We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. @@ -92,7 +104,6 @@ to the image is low enough not to confuse classes. \subsection{Local Elastic Deformations} - This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}. The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. @@ -128,6 +139,12 @@ The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. +\subsection{Distorsion gauss} +This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. + +It has has a probability of not being applied, at all, of 70\%. + + \subsection{Occlusion} This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. @@ -158,10 +175,27 @@ This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. -This filter also has a probability of not being applied, at all, of 25\%. +This filter also has a probability of not being applied, at all, of 75\%. \subsection{Spatially Gaussian Noise} +The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time +we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges: +$[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. +We also create a symmetric averaging window, of the kernel size, with maximum value at the center. +For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers +between the original image and the filtered one. +We initialize to zero a mask matrix of the image size. For each selected pixel we add to the mask the averaging window centered to it. +The final image is computed from the following element-wise operation: $\frac{image + filtered_image \times mask}{mask+1}$. + +This filter has a probability of not being applied, at all, of 75\%. + + \subsection{Color and Contrast Changes} +This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference +between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-\0.85 \times complexity}$ +(this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity +is inverted with $0.5$ probability. + \begin{figure}[h] \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ @@ -193,7 +227,13 @@ \begin{itemize} \item {\bf NIST} \item {\bf P07} +The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. +For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts, +$0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order +and for each of them we sample uniformly a complexity in the range $[0,0.7]$. \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} +NISTP is equivalent to P07 except that we only apply transformations from slant to pinch. Therefore, the character is transformed +but no additionnal noise is added to the image, this gives images closer to the NIST dataset. \end{itemize} \subsection{Models and their Hyperparameters}