changeset 420:a3a4a9c6476d

added transformations description and began dataset descriptions
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Fri, 30 Apr 2010 16:23:15 -0400
parents 0282882aa91f
children 0ffef3667865
files writeup/techreport.tex
diffstat 1 files changed, 50 insertions(+), 10 deletions(-) [+]
line wrap: on
line diff
--- a/writeup/techreport.tex	Fri Apr 30 09:25:20 2010 -0400
+++ b/writeup/techreport.tex	Fri Apr 30 16:23:15 2010 -0400
@@ -71,20 +71,32 @@
 difficulties.
 
 \section{Perturbation and Transformation of Character Images}
+This section describes the different transformations we used to generate data, in their order.
+We can differentiate two important parts in the pipeline. The first one, from slant to pinch, perform transformations 
+of the character. The second part, from blur to contrast, add noise to the image.
 
 \subsection{Adding Slant}
-In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height.
-The coefficient is randomly sampled according to the complexity level and can be negatif or positif with equal probability.
+In order to mimic a slant effect, we simply shift each row of the image proportionnaly to its height: $shift = round(slant \times height)$.
+We round the shift in order to have a discret displacement. We do not use a filter to smooth the result in order to save computing time
+and also because latter transformations have similar effects.
+
+The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level.
+In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest 
+or highest pixel line is of $round(complexity \times 32)$.
+
 
 \subsection{Changing Thickness}
-To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.
+To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i
+
 The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element.
 Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion.
-This will dilate or erode objects in the image, the strength of the transform only depends on the structuring element.
-We used ten different structural elements with various shapes (the biggest is $5\times5$).
-for each image, we radomly sample the operator type (dilation or erosion) and one structural element
-from a subset depending of the complexity (the higher the complexity, the biggest the structural element can be).
-Erosion allows only the five smallest structural elements because when the character is too thin it may erase it completly.
+This will dilate or erode objects in the image and strength of the transform only depends on the structuring element.
+
+We used ten different structural elements with increasing dimensions (the biggest is $5\times5$).
+for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element
+from a subset of the $n$ smallest structuring elements where $n$ is $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ for erosion.
+A neutral element is always present in the set, if it is chosen the transformation is not applied.
+Erosion allows only the six smallest structural elements because when the character is too thin it may erase it completly.
 
 \subsection{Affine Transformations}
 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image.
@@ -92,7 +104,6 @@
 to the image is low enough not to confuse classes.
 
 \subsection{Local Elastic Deformations}
-
 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}.
 
 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image.
@@ -128,6 +139,12 @@
 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
 
 
+\subsection{Distorsion gauss}
+This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$.
+
+It has has a probability of not being applied, at all, of 70\%.
+
+
 \subsection{Occlusion}
 
 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
@@ -158,10 +175,27 @@
 
 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed.
 
-This filter also has a probability of not being applied, at all, of 25\%.
+This filter also has a probability of not being applied, at all, of 75\%.
 
 \subsection{Spatially Gaussian Noise}
+The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time 
+we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges:
+$[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$.
+We also create a symmetric averaging window, of the kernel size, with maximum value at the center. 
+For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers
+between the original image and the filtered one.
+We initialize to zero a mask matrix of the image size. For each selected pixel we add to the mask the averaging window centered to it.
+The final image is computed from the following element-wise operation: $\frac{image + filtered_image \times mask}{mask+1}$.
+
+This filter has a probability of not being applied, at all, of 75\%.
+
+
 \subsection{Color and Contrast Changes}
+This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference 
+between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-\0.85 \times complexity}$
+(this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity 
+is inverted with $0.5$ probability.
+
 
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\
@@ -193,7 +227,13 @@
 \begin{itemize}
 \item {\bf NIST}
 \item {\bf P07}
+The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. 
+For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts,
+$0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order
+and for each of them we sample uniformly a complexity in the range $[0,0.7]$.
 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...}
+NISTP is equivalent to P07 except that we only apply transformations from slant to pinch. Therefore, the character is transformed
+but no additionnal noise is added to the image, this gives images closer to the NIST dataset.
 \end{itemize}
 
 \subsection{Models and their Hyperparameters}