Mercurial > ift6266
diff writeup/techreport.tex @ 431:bfa349f567e8
correction in the transformation descripition
author | Xavier Glorot <glorotxa@iro.umontreal.ca> |
---|---|
date | Mon, 03 May 2010 01:07:21 -0400 |
parents | 9fcd0215b8d5 |
children | 1272dc84a30c d5b2b6397a5a |
line wrap: on
line diff
--- a/writeup/techreport.tex Sun May 02 22:26:19 2010 -0400 +++ b/writeup/techreport.tex Mon May 03 01:07:21 2010 -0400 @@ -81,16 +81,16 @@ and also because latter transformations have similar effects. The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level. -In our case we take uniformly a number in the range $[0,complexity]$, that means, in our case, that the maximum displacement for the lowest +In our case we take uniformly a number in the range $[0,complexity]$, so the maximum displacement for the lowest or highest pixel line is of $round(complexity \times 32)$. \subsection{Changing Thickness} -To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}.i +To change the thickness of the characters we used morpholigical operators: dilation and erosion~\cite{Haralick87,Serra82}. The basic idea of such transform is, for each pixel, to multiply in the element-wise manner its neighbourhood with a matrix called the structuring element. Then for dilation we remplace the pixel value by the maximum of the result, or the minimum for erosion. -This will dilate or erode objects in the image and strength of the transform only depends on the structuring element. +This will dilate or erode objects in the image and the strength of the transform only depends on the structuring element. We used ten different structural elements with increasing dimensions (the biggest is $5\times5$). for each image, we radomly sample the operator type (dilation or erosion) with equal probability and one structural element @@ -100,8 +100,14 @@ \subsection{Affine Transformations} We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. -This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied -to the image is low enough not to confuse classes. +The matrix is of size $2 \times 3$, so we can represent it by six parameters $(a,b,c,d,e,f)$. +Formally, for each pixel $(x,y)$ of the output image, +we give the value of the pixel nearest to : $(ax+by+c,dx+ey+f)$, in the input image. +This allows to produce scaling, translation, rotation and shearing variances. + +The sampling of the parameters $(a,b,c,d,e,f)$ have been tuned by hand to forbid important rotations (not to confuse classes) but to give good variability of the transformation. For each image we sample uniformly the parameters in the following ranges: +$a$ and $d$ in $[1-3 \times complexity,1+3 \times complexity]$, $b$ and $e$ in $[-3 \times complexity,3 \times complexity]$ and $c$ and $f$ in $[-4 \times complexity, 4 \times complexity]$. + \subsection{Local Elastic Deformations} This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in \cite{SimardSP03}.